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PREFACE 



Visual problem solving has been successful for millennia. The Pythago- 
rean Theorem was proved by visual means more than 2000 years ago. The 
entire study of geometry existed as a visual problem-solving field more than 
one and a half millennia before Rene Descartes invented symbolic coordi- 
nates. Albert Einstein wrote in 1953 that the development of Western Sci- 
ence is based on two great achievements: the invention of the formal logic 
system (in Euclidean geometry) and reasoning based on systematic experi- 
mentation during the Renaissance. In the context of this book, it is important 
to notice that the formal logical system in Euclidean geometry was visual. 

Consider two other important historical examples of visual problem solv- 
ing and decision making. Maritime navigation by using the stars presents an 
example of sophisticated visual problem solving and decision-making. Then 
in the 19th century, John Snow stopped a cholera epidemic in London by 
proposing that a specific water pump be shut down. He discovered that pump 
by visually correlating data on the city map. Of course, there continue to be 
many current examples of advanced visual problem solving and decision- 
making. 

This book presents the current trends in visual problem solving and deci- 
sion-making making a clear distinction between the visualization of an al- 
ready identified solution and visually finding a solution. Thus, the book 
focuses on two goals: 

(Gl) displaying a result or solution visually, and 

(G2) deriving a result or solution by visual means. 

The first goal has two aspects: Gl(a) displaying results to a noviee and 
Gl(b) eonvineing a deeision maker. Recently mass media (US News and 
World Report, Dec. 2003, p.30) reported that intelligence analysts knew the 
danger of coming September 1 1 but convincing decision makers was one of 
their major challenges: “There were people who got it at the analyst level, at 
the supervisory level, but all of us were outnumbered”. A novice simply 
does not know the subject but has no prejudice, priorities, special interests or 
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other preconceived notions that may prevent the digesting of information. 
Decision makers may have all of these characteristics. 

The second goal, G2 is even more difficult to achieve especially for non- 
structured problems. Obviously, there are many intermediate goals that fall 
between the two extremes and the journey from G1 to G2 is not a short, non- 
stop flight. This is the reason that we consider this bsook to be the first step 
in a future series “Visual decision making and problem solving.” 

A typical example of G1 is the animation of a known algorithm for a 
novice. Here the intention is to show visually the algorithm’s steps. A differ- 
ent situation arises relative to G2 when we use animation to discover proper- 
ties of an algorithm visually such as the number of loops and the amount of 
space required for the task. If the animation tool permits viewing loops and 
space used, then it can serve as a visual problem-solving tool. Thus, both 
tasks might use the same technique. This observation shows that the essence 
of the transition from G1 to G2 is not the visual technique itself (e.g. anima- 
tion). The essence of achieving G2 is in matching a decision- 
making/problem-solving task with a visual technique. The same type of 
matching may be required to convince a decision maker. Note that it is likely 
that a simple animation technique appropriate for the novice would not be 
sufficient for an advanced visual decision-making on algorithm efficiency by 
an experienced analyst. On the other hand, there are also situations, which 
show that after a solution is discovered through the use of sophisticated ana- 
lytical and visual means a very simple visualization can be sufficient and 
desirable for convincing a decision maker. 

We use the term brute force visual problem solving for the approach in 
which every available visual and/or analytical technique is tried for the task. 
A task-driven approach is a better alternative and is one theme that is dis- 
cussed throughout the book. It can be implemented at a variety of levels 
from the global decision level to subpixel sensor level. Such an approach can 
involve the automatic generation/selection of a visual tool based on a user’s 
query to the automatic generation of composite icons matched to user’s que- 
ries. Task driven approaches can also involve of the use of hierarchical, vis- 
ual, decision-making models and the recording the analyst’s visual decision- 
making procedures. 

The book emphasizes the difference between visual decision making and 
visual data mining. Visual data mining discovers useful regularities visually 
or visualizes patterns discovered by common data mining tools such as neu- 
ral networks and decision trees. We will cover this subject along with a look 
at spatial data mining, which combines analytical and visual mining of spa- 
tial data. Visual decision making relies heavily on visual data mining, but 
useful regularities are only a part of the entire decision-making process. In 
finance, visual data mining helps to discover market trends. Visual invest- 
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merit decision-making is used to produce buy/sell signals and to help select a 
portfolio. In medicine, visual data mining helps with patient diagnosis. Deci- 
sion on a course of treatment is also heavily based on the diagnosis, but it is 
only a part of a treatment decision. For instance, a treatment decision should 
avoid a patient’s allergic reactions and other negative side effects. Diagnos- 
tic data mining does not cover this issue. In defense applications, visual data 
mining can provide valuable intelligence clues for planning an operation. A 
visual decision about a military strike requires that much more information 
be considered, including the availability of military forces and political reali- 
ties. Visual data mining is also useful in breaking drug trafficking rings, but 
the actual decision on these operations needs to involve more aspects of the 
problem. 

The foundations and applications of visual problem solving and decision 
making are many and varied. To organize these topics, the book has been 
divided into five parts: (1) visual problem solving and decision making, (2) 
visual and heterogeneous reasoning, (3) visual correlation, (4) visual and 
spatial data mining, and (5) visual and spatial problem solving in geospatial 
domain s . 

As noted. Part 1 addresses visual decision making and is divided into two 
chapters. Chapter 1 provides a broad overview of current trends in visual 
decision making and problem solving. This overview includes: further dif- 
ferentiating the visualization of a solution and the generation of a solution 
visually. The chapter describes general, hierarchical, visual decision-making 
models using structural information and ontologies. 

Chapter 2 provides an extensive discussion of the efficiency of informa- 
tion visualization techniques. It suggests an informal model (called informa- 
tion visualization value stack model) that predicts a problem area called 
“sweet spof ’, where information visualization will most likely achieve utili- 
zation. This model is based on a set of qualitative problem parameters identi- 
fied in the chapter. 

Visual and heterogeneous reasoning is the focus of Part 2 and consists of 
Chapters 3-7. Reasoning plays a critical role in decision making and problem 
solving. Chapter 3 provides a comparative analysis of visual and verbal (sen- 
tential) reasoning approaches and their combination called heterogeneous 
reasoning. It is augmented with a description of application domains of vis- 
ual reasoning. One of the conclusions of this chapter is that the fundamental 
iconic reasoning approach introduced by Charles Peirce is the most compre- 
hensive heterogeneous reasoning approach. 

Chapter 4 describes a computational architecture for applications that 
support heterogeneous reasoning. Heterogeneous reasoning is, in its most 
general form, reasoning that employs representations drawn from multiple 
representational forms. Of particular importance, and the principal focus of 
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this architecture, is heterogeneous reasoning that employs one or more form s 
of graphical representation, perhaps in combination with sentences (of Eng- 
lish or another language, whether natural or scientific). The architecture is 
based on the model of natural deduction in formal logic. This chapter de- 
scribes and motivates modifications to the standard logical model necessary 
to capture a wide range of heterogeneous reasoning tasks. 

Chapter 5 provides a discussion of mathematical visual symbolism for 
problem solving based on an algebraic approach. It is formulated as lessons 
that can be learned from history. Visual formalism is contrasted with text 
through the history of algebra beginning with Diophantus’ contribution to 
algebraic symbolism nearly 2000 years ago. Along the same lines, it is 
shown that the history of art provides valuable lessons. The evident histori- 
cal success provides a positive indication that similar success can be re- 
peated for modem decision-making and analysis tasks. Thus, this chapter 
presents the lessons from history tuned to new formalizations in the form of 
iconic equations and iconic linear programming. 

Chapter 6 describes an iconic reasoning architecture for analysis and de- 
cision-making along with a storytelling iconic reasoning approach. The ap- 
proach provides visuals for task identification, evidence, reasoning mles, 
links of evidence with pre-hypotheses, and evaluation of hypotheses. The 
iconic storytelling approach is consistent hierarchical reasoning that includes 
a variety of mles such as visual search-reasoning mles that are tools for find- 
ing confirming links. The chapter also provides a review of related work on 
iconic systems. The review discusses concepts and terminology, controversy 
in iconic language design, links between iconic reasoning and iconic lan- 
guages and requirements for an efficient iconic system. 

Chapter 7 considers directions for visual reasoning and discovery. Cur- 
rently, computer visualization is moving from a pure illustration domain to 
visual reasoning, discovery, and decisions making. This trend is associated 
with new terms such as visual data mining, visual decision making, and het- 
erogeneous, iconic and diagrammatic reasoning. Beyond a new terminology, 
the trend itself is not new as the early history of mathematics clearly shows. 
This chapter demonstrates that we can learn valuable lessons from the his- 
tory of mathematics for visual reasoning and discovery. 

Visual correlation is the thmst of Part 3, which consists of Chapters 8-10. 
Chapter 8 introduces the concept of visual correlation and describes the es- 
sence of a generalized correlation to be used for multilevel and conflicting 
data. Several categories of visual correlation are presented accompanied by 
both numeric and non-numeric examples with three levels (high, medium 
and low) of coordination. The chapter presents examples of multi-type visual 
correlations. The chapter also provides a classification of visual correlation 
methods with corresponding metaphors and criteria for visual correlation 
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efficiency. The chapter finishes with a more formal treatment of visual cor- 
relation, providing formal definitions, analysis, and theory. 

Chapter 9 presents the state-of-the-art in iconic descriptive approaches to 
annotating, searching, and correlating that are based on the concepts of com- 
pound and composite icons, the iconic annotation process, and iconic que- 
ries. Specific iconic languages used for applications such as video annota- 
tion, military use and text annotation are discussed. Graphical coding princi- 
pals are derived through the consideration of questions such as: How much 
information can a small icon convey? How many attributes can be displayed 
on a small icon either explicitly or implicitly? The chapter also summirizes 
impact of human perception on icon design. 

Chapter 10 addresses the problem of visually correlating objects and 
events. The new Bruegel visual correlation system based on an iconographic 
language that permits compact information representation is described. The 
description includes the Bruegel concept, functionality, the ability to com- 
press information via iconic semantic zooming, and dynamic iconic sen- 
tences. The formal Bruegel iconic language for automatic icon generation is 
outlined. The chapter is devoted to case studies that describe how Bruegel 
iconic architecture can be used. 

Part 4 addresses visual and spatial data mining and consists of Chapters 
11-16. Chapter 11 introduces two dynamic visualization techniques using 
multi-dimensional scaling to analyze transient data streams such as news- 
wires and remote sensing imagery. The chapter presents an adaptive visuali- 
zation technique based on data stratification to ingest stream information 
adaptively when influx rate exceeds processing rate. It also describes an in- 
cremental visualization technique based on data fusion to project new infor- 
mation directly onto a visualization subspace spanned by the singular vectors 
of the previously processed neighboring data. The ultimate goal is to lever- 
age the value of legacy and new information and minimize re-processing of 
the entire dataset in full resolution 

In Chapter 12, the main objective of the described spatial data mining 
platform called SPIN! is to provide an open, highly extensible, n-tier system 
architecture based on the Java 2 Platform, Enterprise Edition. The data min- 
ing functionality is distributed among (i) Java client application for visuali- 
zation and workspace management, (ii) application server with Enterprise 
Java Bean container for running data mining algorithms and workspace 
management, and (iii) spatial database for storing data and spatial query exe- 
cution. In the SPIN! system, visual problem solving involves displaying data 
mining results, using visual data analysis tools, and finally producing a solu- 
tion based on linked interactive displays with different visualizations of 
various types of knowledge and data. 
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Chapter 13 begins by looking at the Predictive Model Markup Language 
(PMML), an XML-based industrial standard for the platform- and system- 
independent representation of data mining models. VizWiz, a tool for the 
visualization and evaluation of data mining models that are specified in 
PMML, is presented. This tool allows for the highly interactive visual explo- 
ration of a variety of data mining result types such as decision trees, classifi- 
cation and association rules or subgroups. 

Chapter 14 describes new neural-network techniques developed for the 
visual mining of clinical electroencephalograms (EEGs). These techniques 
exploit the fruitful ideas of the Group Method of Data Handling (GMDH). 
The chapter briefly describes the standard neural-network techniques that are 
able to learn well-suited classification modes from data presented by rele- 
vant features. It then introduces and applies an evolving cascade neural net- 
work technique that adds new input nodes as well as new neurons to the 
network while the training error decreases. The chapter also presents the 
GMDH-type polynomial networks trained from data. New neural-network 
techniques developed to derive multi-class concepts from data are described 
and applied. 

Chapter 15 discusses how to represent scientific visualization and data 
mining tasks in a simpler form so that visual solutions become possible. 
Visualization is used in data mining for the visual presentation of already 
discovered patterns and for discovering new patterns visually. Success in 
both tasks depends on the ability of presenting abstract patterns as simple 
visual patterns. A new approach called inverse visualization (IV) is sug- 
gested for addressing the problem of visualizing complex patterns. The ap- 
proach is based on specially designed data preprocessing, which is based on 
a transformation theorem proved in this chapter. A mathematical formalism 
is derived from the Representative Measurement Theory. The possibility of 
solving inverse visualization tasks is illustrated on functional non-linear ad- 
ditive dependencies. 

Chapter 16 describes a new technique for extracting patterns and rela- 
tions visually from multidimensional binary data. The proposed method re- 
lies on monotone structural relations between Boolean vectors in the n- 
dimensional binary cube and visualizes them in 2-D as chains of Boolean 
vectors. Actual Boolean vectors are laid out on this chain structure. Cur- 
rently the system supports two visual forms: the multiple disk form and the 
“Yin/Yang” form. 

Part 5 concludes the book with geospatial data analysis, decision making 
and problem solving and consists of chapter 17-21. This focus is not acci- 
dental - geospatial problems are naturally visual and spatial. Chapter 17 fea- 
tures a general framework for combining geospatial datasets. The frame- 
work is task-driven and includes the development of task-specific measures. 
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the use of a task-driven conflation agent, and the identification of task- 
related default parameters. The chapter also describes measures of decision 
correctness and the visualization of decisions and conflict resolution by us- 
ing analytical and visual conflation agents. Finally, the chapter elaborates 
mathematical (geometric and topological) techniques for decision making 
and problem solving for combining geospatial data. 

Chapter 18 addresses imagery conflation and registration problems by 
providing an Analytical and Visual Decision Framework (AVDF). This 
framework recognizes that pure analytical methods are not sufficient for 
solving spatial analysis problems such as integrating images. Without 
AVDF, the mapping between two input data sources is more opportunistic 
then definitive. A partial differential equation approach is used to illustrate 
the modeling of disparities between data sources for a given mapping func- 
tion. A specific case study of AVDF for pixel-level conflation is presented 
based on Shannon’s concept of mutual entropy. The chapter also demon- 
strates a method of computation reduction for defining overlapping image 
areas. 

Chapter 19 looks at spatial decision making and analysis, which heavily 
depend on the quality of image registration and conflation. An approach 
based on algebraic invariants for the confiation/registration of images that 
does not depend on identifying common points is developed. This new ap- 
proach grew from a careful review of other conflation processes based on 
computational topology and geometry. This chapter describes the theory of 
algebraic invariants and describes a confiation/registration method and 
measures of correctness for feature matching and conflation. 

Chapter 20 presents technology for conflation algorithm development 
with a wide applicability domain. The sequence of steps starts from vague 
but relevant expert concepts and ends with an implemented conflation algo- 
rithm. The generic steps are illustrated with examples of specific steps from 
the development history of an area-based “shape size ratio” conflation algo- 
rithm. The fundamental “shape size ratio” measure underlying the algorithm 
has rather strong invariance properties, including invariance to dispropor- 
tional scaling. 

Chapter 21 presents an Artificial Intelligence technique for generating 
(on-the-fly) rules of visual decision making for use by experienced imagery 
analysts. The chapter addresses the construction of a methodology and tools 
that can assist in building a knowledge base of imagery analysis. Further, the 
chapter provides a framework for an imagery virtual expert system that sup- 
ports imagery registration and conflation tasks. The approach involves three 
strategies: recording expertise on-the-fly, extracting information from the 
expert in an optimized way using the theory of monotone Boolean functions, 
and using iconized ontologies to build a conflation method. 
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This book is the first guide to focus on visual decision making and prob- 
lem solving in general and for geospatial applications specifically. It com- 
bines theory and real-world practice. The book includes uniformly edited 
contributions from a multidisciplinary team of experts. Note that the book is 
not a collection of independent contributions, but rather a book of intercon- 
nected chapters. The book is unique in its integration of modem symbolic 
and visual approaches to decision making and problem solving. As such, it 
ties together the monograph and textbook literature in this new emerging 
area. Each chapter ends with a summary and exercises. 

The intended audience of this book is professionals and students in com- 
puter science, applied mathematics, imaging science and Geospatial Infor- 
mation Systems (GIS). Thus, the book can be used a text for advanced 
courses on the subjects such as modeling, computer graphics, visualization, 
image processing, data mining, GIS, and algorithm analysis. 

We would like to begin our acknowledgements by thanking all the con- 
tributing authors for their efforts. A significant part of work presented in 
this book has been sponsored by the US Intelligence Community, the De- 
partment of Defense, and the Department of Energy. Authors of individual 
chapters have made such acknowledgements in their respective chapters. 
Several other chapters have been supported by European funding agencies 
that are also acknowledged in the individual chapters. All support is grate- 
fully acknowledged. Special thanks go to ARDA/NGA GI2Vis Program and 
NGA Academic Research Program managers, panel members, and partici- 
pants for their interest, stimulating discussions and a variety of support. Sev- 
eral students contributed to this book as co-authors and others assisted us in 
other forms. Richard Boyce, Mark Curtiss, Steven Heinz, Ping Jang, Bea 
Koempel-Thomas, Ashur Odah, Paul Martinez, Logan Riggs, Jamie Powers, 
and Chris Watson provided such assistance. 

Please find book-related information at www.cwu.edu/~borisk/bookVis. 
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PART 1 

VISUAL DECISION MAKING 




Chapter 1 

DECISION PROCESS AND ITS VISUAL 
ASPECTS 



Boris Kovalerchuk 

Central Washington University 



Abstract: This chapter provides a conceptual link between the decision making process, 

visualization, visual discovery, and visual reasoning. A structural model of the 
decision making process is offered along with the relevant visual aspects. Ex- 
amples of USS Cole incident in 2000 and the Cholera epidemic in London in 
1854 illustrate the conceptual approach. A task-driven visualization is de- 
scribed as a part of the decision making process and illustrated with browsing 
and search tasks. 
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A picture is worth a thousand words. 

Proverb 

A picture tells a thousand lies. 

Many, 2003 

Sometimes, a picture is better than a thousand words. 

NGA, Pathfinder, 2004 



1. CURRENT TRENDS 

Visual problem solving has been known for millennia as both great suc- 
cess and failure in science, mathematics, and technology. The quotes above 
sum up these facts in a few words. Below we present current trends in this 
area that indicate that the field is moving (1) from mostly geometric visuals 
to more abstract algebraic, symbolic visuals; (2) from the visualization of 
solutions to finding solutions visually, (3) from visual data mining to finding 
solutions visually; (4) from drawing tools to visual discovery and conceptual 
analysis, and (5) from abstract decision models to visual decision models. 
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1.1 From geometric visuals for algebraic symbolic 
visuals 

The Pythagorean Theorem was proven by visual means more than 2000 
years ago. The entire study of geometry existed as a visual problem-solving 
field more than one and a half millennia before Rene Descartes invented 
symbolic coordinates. 

In 1953, Albert Einstein wrote that the development of Western Science 
is based on two great achievements: the invention of the formal logic system 
(in Euclidean geometry) and reasoning based on systematic experimentation 
during the Renaissance. For us it is important to notice that the formal logi- 
cal system in Euclidean geometry was visual. 

Historically in mathematics visuals are associated with geometry that can 
be traced to the concept of the number in Greek mathematics. This contrasts 
with “non-visual,” abstract mathematics that began with Descartes and ana- 
lytic geometry [Schaaf, 1930]. In other words, this is the fundamental con- 
ceptual difference between concrete visual forms (in geometry) and abstract 
forms (in algebra) of mathematics. Typically, an abstract algebraic form is 
not considered to be a visual form although abstract symbols, icons, are 
graphical representations. 

However, the point is that the algebraic abstract form is also visual and in 
some sense, the algebraic visual form is more general than the geometric 
visual form. To be abstract does not necessarily mean to be non-visual. The 
concept can be visual, abstract and very productive simultaneously. Impor- 
tant differences between abstract and concrete hide significant similarities 
between geometric and algebraic approaches - both of them are visual 
forms, but concrete and abstract respectively. 

The geometric form often is an individual invention for a specific task 
and that is not applicable to other tasks. This was one of the main reasons 
that mathematics moved from the geometric proofs of Greek mathematics to 
the more abstract Cartesian mathematics that permitted to work on geometric 
problems in an algebraic form. 

Chapter 5 and 7 of this book show the productivity of the algebraic visual 
approach using historical examples. This productivity derives from the fact 
that solving algebraic equation in symbolic form is much more efficient than 
using words or geometry. 

1.2 From visualization of solution to finding solution 
visually 

Visualization of a solution is quite different from finding a solution visu- 
ally. Thus, there is a significant conceptual and practical difference between 
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(1) visualization of an already identified solution and 

(2) visually finding the solution. 

The second task is much more ambitious and difficult especially for non- 
structured problems. A typical example of the first task is the animation of a 
known algorithm for a novice, to visually demonstrate the steps of an algo- 
rithm. 

A different situation arises with the second task when we use animation 
to visually discover properties of an algorithm such as the number of loops 
and the amount of space required for the task. 

If an animation tool permits viewing loops and space used, then it can 
serve as a visual problem-solving tool. Thus, it is possible that the same 
technique could be used for both tasks. This observation shows that the es- 
sence of the transition from task 1 to task 2 is not the visual technique itself 
(e.g. animation). 

The essence of task 2 is in matching a decision-making, problem-solving 
task with a visual technique. Note that it is likely that a simple animation 
technique appropriate for the novice would not be sufficient for an advanced 
visual decision-making on algorithm efficiency by an experienced analyst. 



1.3 From visual data mining to visual decision making 

Visual data mining discovers useful regularities visually or visualizes 
patterns discovered by common data mining tools such as neural networks 
and decision trees. Visual decision making produces decisions that rely 
heavily on visual data mining, but useful regularities are only a part of the 
whole decision-making process. 

In finance, visual data mining helps to discover market trends. Visual in- 
vestment decision making is used to produce buy/sell signals and to help 
select a portfolio. 

In medicine, visual data mining helps with patient diagnosis. Decisions 
about a course of treatment are heavily based on the diagnosis, but this is 
only a part of a treatment decision. For instance, a treatment decision should 
avoid a patient’s allergic reactions and other negative side effects. Diagnos- 
tic data mining does not cover this issue. 

In defense applications, visual data mining can provide valuable intelli- 
gence clues for planning an operation. A visual decision about a military 
strike requires that much more information be considered, including the 
availability of military forces and political realities. Visual data mining is 
also useful in breaking drug trafficking rings, but actual decision on these 
types of operations should involve more aspects of the problem. 
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1.4 From drawing tools to visual discovery and 
conceptual analysis 

Let us begin our considerations here with an analysis that uses architec- 
tural Computer Aided Design (CAD) tools. Typical CAD tools help an 
architect implement an architectural solution similar to how a pen helps us 
recording our everyday solutions. This is obviously useful but it is not 
guiding an architectural solution. Finding an architectural solution is an 
extremely difficult task, because in essence architecture is art. This means 
that we need a visual discovery tool that is more complex than visualization 
of already discovered solution. One might argue that the architect can start 
sketching without a clear solution and can discover it in the process of using 
a CAD tool. Thus, the CAD tool would be a discovery tool too. We would 
disagree. Leo Tolstoy wrote and rewrote some of his famous works many 
times with a simple pen. Should we call his pen a discovery tool? 

It is extremely difficult to distinguish between a genuine visual discov- 
ery tool and a less sophisticated tool. There are two extremes: 

1. visual tools that provide an algorithmic solution and 

2. visual tools that only help in recording a solution. 

Most available tools lie somewhere between these two extremes. They sup- 
port recording and visualization of the solution and partially support discov- 
ery of the solution. 

At first glance, it maybe surprising but one of the best-known visual 
techniques that provides an algorithmic solution is basic elementary school 
arithmetic known for centuries. For instance, adding 35 and 17 we (a) write 
them one under another, (b) add 5 and 7, (c) write 2 which is the lesser digit 
of the sum 5+7 below the result line, (d) write 1 that is the greater digit of 
the sum 5+7 and known as the carry in a separate location, (e) add next dig- 
its 3 and 1 , (f) add carry to them, and (g) write the result 5 below the result 
line next to the 2. This explanation text is less clear than a graphical form 
where 17 is really written under 35. 

What is important in this example? The example shows an algorithmic 
process, not an art, where the result varies from person to person. Every per- 
son with an elementary school background should produce the same number 
52 when the sequence described above is followed. 

In architectural design, different architects can add two buildings to each 
other very differently. The CAD tool will support any of these solutions 
without guiding the solution. Thus, this process is not algorithmic but rather 
artistic. Somebody can argue that adding a building is much more complex 
task than adding numbers. It is true, but thousands years ago adding numbers 
was not an easy task either nor is it an easy task for elementary school stu- 
dents today. Later in this book, you will find the Chapter 4 written by D. 
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Barker-Plummer and J. Etchemendy which discusses in depth the visual rea- 
soning problems in architectural design. 

Visual problem-solving and decision-making tasks and approaches 
have a natural scale. This is the assistance level that can be provided by a 
decision support system (DSS). Such scale is a semantic scale that represents 
a specific type of semantic differentials [Osgood, 1952]. Our semantic scale 
has two extremes: 

• recording and visualization tools, and 

• mathematical, algorithmic tools and proofs. 

This scale is depicted in Figure 1. Typical current CAD systems are obvi- 
ously not fully algorithmic. 




Recording and Interactive Automatic 

visualization algorithmic tools algorithmic tools 

tools and proofs 



Figure 1. Assistance level scale 

Recording and visualization tools are exemplified by visualization of the 
proof of Pythagorean Theorem. Note that for 2000 years there has been no 
automatic algorithmic tool for discovering the theorem a^+b^=c^ better than a 
heuristic search in the set of all possible formulas with the form a"+b™=c'^. 
There are only verification tools that can prove already discovered theorem 
statement, a^+b^=c^. 

Another field that is moving from drawing tools to a deeper conceptual 
analysis is designing software speeifieation especially in enterprise infor- 
mation systems. 

Visual tools include traditional whiteboards and drawing tools with boxes 
and various connectors such as Visio and Unified Modeling Language 
(UML) diagrams. These tools are not limited by diagram drawing. UML 
permits one to capture the whole object-oriented logic of a design for further 
detailed programming. 

However, visual support for upper-level conceptual data modeling with 
domain experts is arguably less than ideal [Halpin, 2000]. Other options that 
facilitate conceptual level design with visuals now include the Object Role 
Modeling (ORM) language combined ontologies presented in DAML-OIL 
language. This would permit the derivation of UML class diagrams inte- 
grated with ontologies. 
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1.5 From abstract decision model to visual decision 
models 

The traditional mathematical decision-making and problem-solving ap- 
proach assumes that a problem should be structured and presented as a for- 
mal mathematical model. This often requires a double conversion: from a 
visual form into a formal model and then from the formal model back to a 
visual form (see Figure 2). 

For many real-world problems such as CAD and imagery analysis, build- 
ing a formal model can be too complex to be realistic. On the other hand, the 
efforts to build a formal mathematical model can be unnecessary when solv- 
ing those problems visually. 




Figure 2. Double conversion process 

There is a potential that a visual model can substitute/augment an abstract 
model to some extent. The above-mentioned example of a visual proof of the 
Pythagorean Theorem 2000 years ago shows that this has been feasible for a 
long time. Now is the time for expanding this field of study for problems 
that are posed visually from the beginning. Problems involving satellite im- 
agery and map fusion, geospatial data conflation are among them. There are 
many other such problems in a variety of fields. The challenge is to find a 
way for minimizing double conversion effort. 

Conversion is so common now that we often do not notice it. Modern 
digital computer architecture relies on the internal binary representation of 
numbers. This is not a convenient, compact visual form that a normal human 
being can operate - the decimal number 1 024 is equivalent to binary number 
10000000000. Thus, decimals are converted to binary and the results are 
converted to decimals again. Recall that some early computer architectures 
in 1940s and 1950s were decimal based (ENIAC) while others were based 
on Roman numerals (TFIROBAC I) without conversion to binary. The last 
example is especially notable, because Claude Shannon known as a father of 
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the information theory designed it. His THrifty ROman-numeral BAckward- 
looking Computer was able to add, subtract, multiply and even divide num- 
bers up to 85 working only with Roman numerals [Calderbank & Sloane, 
2001]. It is interesting to understand why the double conversion happened 
in the computer architecture field. The answer is that binary digital logic 
devices were simpler to build and conversion to and from binary was not too 
difficult. In other problems, conversion maybe and often is a significant or 
even formidable challenge. Often the original representation of a task is too 
complex to be solved, and a relatively simple visualization is impossible if 
we try to build the visual for the task “as is.” Simplification of a task’s rep- 
resentation may be the first step for visual data mining and decision-making 
as discussed in Chapters 15 and 16. 



2. CATEGORIES OF VISUALS 



2.1 Illustration, reasoning and discovery 

Visuals can be classified in three categories: (1) illustration, (2) reason- 
ing, and (3) discovery. Illustration means showing the essence of objects, 
events, solutions, decisions, or statements. Reasoning means showing why 
these are relevant objects, events, solutions, decisions, or statements. And 
discovery means showing how to find relevant objects, events, solutions, 
decisions or statements. These categories form another semantic scale that 
we call the creativity scale for visuals. This scale is illustrated in Figure 3 
using the term statement as an umbrella term for any category to be pre- 
sented visually. Illustration and discovery are the two extremes in this scale 
with many intermediate mixed cases. Reasoning occupies the middle of this 
scale. 



I I ► 

Illustration Reasoning Discovery 

Visualization Visualization of Visual 

of statement S proof of S discovery of S 

Figure 3. Scale of creativity levels of visual problem solving methods 

A pure illustration of this is chess notation that permits us to replay a 
wining game, but does not give us a clue on how to select a winning strat- 
egy. The discovery category is clearly the most difficult; thus, it is not acci- 
dental that there is no visualization in this category for the Pythagorean 
Theorem although the Theorem has existed for 2000 years and was proved 
using geometric diagrams many times. Indeed, more than 370 different geo- 
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metric proofs have been published [Loomis, 1968]. All of them represent a 
reasoning category on creativity scale. One of visual proofs, shown in figure 
4, is in essence the same as that used by Euclid. Now it is an animated Java 
applet [Morey, 1995]. Figure 5 is quite different from figure 4 fits more 
readily into the illustration category. Figure 5 both visualizes and visually 
proves the statement of the Theorem for a triangle with sides of length 3, 4, 
and 5, i.e. 3^+4^=5^. It does not prove the Theorem’s general statement that 
a^+b^=c^ for any right triangle. 

Both figures 4 and 5 lack the ability to show us how the theorem could be 
discovered. A proof deals with a hypothesis that should be proved (verified) 
or refuted. To do this we first need to generate that hypothesis. That is to 
have a visual process helping generate an initial set of reasonable hypotheses 
that should include a true theorem statement. Next, we need to test hypothe- 
ses visually. Without a discovery process the situation would be similar to 
finding an exit in the maze using random trials. Thus, we distinguish three 
categories of visuals applied to theorems: 

1) Illustration; visualization of the theorem statement (Figure 5), 

2) Reasoning (verification): visualization of the proof process for the 
theorem’s statement (see Figure 4, triangles are moved without 
changing their areas), and 

3) Discovery: visualization of the discovery process that identifies theo- 
rem’s statement as a hypothesis. 




Figure 4. Visual proof of Pythagorean Theorem (screenshots of Morey’s animation applet) 
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Figure 5. Result visualization for Pythagorean Theorem 

For the Pythagorean Theorem when we are not proving the theorem, but 
using its proved result (a^+b^=c^) in a particular situation (e.g., a=3, b=4), 
the reasoning step will be computing the specific numeric result, 

c = 5 = +4^ , by applying the theorem. Thus, for mathematical tasks 

based on the use of the theorems we can describe categories of visuals in the 
following way: 

• Illustration: visualization of the solution of an individual task based 
on the use of the theorem, (e.g.. Figure 4 illustrates both the general 
statement of Pythagorean Theorem and an individual task with spe- 
cific sides 3, 4 and 5). 

• Reasoning: verification (proof) of the theorem, and computation of 
the result by applying the theorem, e.g., computing side c of the 
right triangle given sides a and b. 

• Discovery: visualization of the discovery process that identifies 
theorem’s statement as a hypothesis. 

In visual decision making the listed categories have their counterparts: 

• Illustration: visualization of the decision (solution) statement 

• Reasoning: explanation of why this is a correct decision, verification 
of the hypothesis, and visualization of the reasoning process that 
leads to the decision statement using a verified hypothesis. 

• Discovery: visualization of the process of hypothesis discovery. 



2.2 Informal, heuristic, and rigorous visual decision 
making 



The scale described in Figure 3 represents a creativity level for visuals 
ranging from illustration through reasoning and onto discovery. It does not 
however cover another important aspect of visual decision making and prob- 
lem solving - the algorithmic-level scale. This scale can be characterized 
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over the following range: informal (“artistic”) approach, heuristic algorithm, 
rigorous (full-solution) algorithm. This scale is shown in Figure 6. 

An informal algorithm can be an instruction: “Follow good previous 
practice” accompanied by few examples of “good practice.” The success of 
a heuristic algorithm depends strongly on the case and the skills of the sub- 
ject matter expert (SME). A full-solution algorithm provides a rigorous and 
unambiguous way of producing a solution. 

7 ^ 1 ► 

Informal Heuristic Rigorous 

algorithm algorithm algorithm 



Figure 6. Scale of algorithmic level of a visual problem solving method 



Figure 7 combines the scales for algorithmic and creativity levels. Flere 
the sizes of circles indicate the relative number of methods currently avail- 
able (part a) and the relative number of methods desired (part b). It is obvi- 
ous from this figure that new methods dealing with full-solution algorithms 
for discovery tasks are in short supply. 
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Figure 7. Combined scales for algorithmic and creativity levels 

Below we provide some examples of tasks that occupy the extreme cells 
in Figure 7. The first one is an illustration (I) combined with informal algo- 
rithm (lA) for accomplishing a task, <1, IA>. Architectural drawing using 
CAD tools falls into this category. Another extreme is a discovery task (D) 
combined with a full-solution algorithm (FA) for discovery, <D, FA>. Dis- 
covery of the number n by interpolating a circle with polygons belongs to 
this category. Increasing the number of sides in the polygon increases accu- 
racy of the solution. The two other extremes are represented by <D, IA> and 
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<1, FA>. Here <D, IA> means the discovery of the solution with informal 
algorithm in hand. A variety of guidelines in architectural and engineering 
design reside in this category. Such tools do not go beyond providing insight 
for finding a solution. The case described by <1, FA> is an illustration of an 
algorithm that provides a full-solution. For instance, it can show the regular 
polygon with, say, 100 sides and the approximation to the number n that it 
provides. Figure 5 represents another example in this category. 

An example of the combination of reasoning (R) with a full-solution al- 
gorithm <R, FA> is the visual verification of the Pythagorean Theorem (see 
Figure 4). 

The center point on both scales in Figure 7 is a combination of reasoning 
with a heuristic algorithm (HA) or <R, HA>. The visual, interactive schedul- 
ing of jobs using heuristic, greedy strategies such as “largest jobs first” is a 
representative of this category. 

The creativity scale can be further elaborated by distinguishing between 
the Discovery of an Individual solution (DI) and the Discovery of a Process 
(DP) that can lead to several individual solutions. 

Reasoning also has two subcategories. The first one is finding a solution 
by Applying a verified solution Process (AP), e.g., finding the hypotenuse 
for the triangle with sides 3 and 4 by applying the general statement of the 
Pythagorean Theorem. The second more challenging task is Verification of 
the solution Process (VP), e.g., proving the Pythagorean Theorem. 

In Chapters 5 and 7, we provide more examples of visualization as illus- 
tration, visual reasoning, and visual discovery from the history of mathemat- 
ics. The examples include the process of discovery (number and visual 
counting. Such analysis establishes a background for developing visual deci- 
sion-making processes for modem tasks. 



3. A MODELING APPROACH 

Every day mass the media shows impressive visualizations of events in 
the World. One of them was published by “Time” magazine about the attack 
on USS Cole in Aden in October 2000 [Ratnesar, 2000]. This visualization 
provided a rich multilevel visualization. The visual is a sequence of increas- 
ingly focused pictorials that starts with a view of the World and ends up de- 
picting an individual injured sailor. The visual presents six levels of detailed 
visualization in the process: (1) World, (2) region, (3) port, (4) ship, (5) 
damage area, and (6) sailor. While the visualization shows many details 
about USS Cole’s equipment including armament, it does not help much in 
decision making - namely, how to prevent such deadly attacks. There are 
two reasons for that: (1) decision making is not a mass media goal and (2) 
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“rich” information is actually scarce for decision making. Figure 8 shows an 
iconic summary of this visualization that we constructed. This summary 
quickly shows that visualization from “Time” while being creative and im- 
pressive does not contain much information useful for decision making. It is 
clear that this is an example of the illustrative level of visualization only, but 
when the task is that of guiding reasoning and decision making, higher levels 
of visualization are needed. 
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Figure 8. Structure of visualization of the attack on USS Cole 



Consider next, another example as described by E. Tufte [Tufte, 1997] on 
the cholera epidemic in London in 1854 using an original work by Dr. J. 
Snow [Snow, 1855]. This example includes several visualizations. Some of 
them show growing death toll day by day in September 1854 ("within two 
hundred and fifty yards of the spot where Cambridge Street joins Broad 
Street, there were upwards of five hundred fatal attacks of cholera in ten 
days" [Tufte, 1997; Snow, 1855]). However, these visualizations did not 
help make decisions regarding how to stop the epidemic. 

There was another visualization that matched/correlated the death tolls 
with locations of water pumps/wells. Specifically, Dr. Snow marked deaths 
from cholera on a map, along with locations of the area's 1 1 community wa- 
ter pump/wells. This visualization proved extremely useful for decision- 
making (DM). It prompted the authorities to shut down a specific pump lo- 
cated in the area with a high death toll. 

An analysis of these examples shows that the first case (Aden) lacks a 
discovered relation between the attack and attributes useful for decision 
making to prevent such attacks in the future. 

The second example obviously has such a relation, which was discovered 
by Dr. Snow on September 7, 1854, and which allowed the Board of Guardi- 
ans of St. James's Parish to make a decision to prevent a further spread of the 
cholera by shutting down that well on September 8, 1854. The epidemic 
ended within two weeks. 
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These two examples help us to make a point about the concept of deci- 
sion making visualization (DMV); namely, that is visualization useful for 
decision making based on: 

• a discovered relation/pattern (DRP) and 

• a decision making model (DMM). 

The first example (Aden) is creatively impressive, but does not include the 
components DRP and DMM. The second example (London) includes both 
of them: 

1 . Discovered relation ~ people who used water from well d (death) on 
Broad St. died more often from cholera than people who used any other 
well: 

Vi (i ^d) D(d) > D(i), 

where D(i) is the number of dead after drinking water from well i. 

“There were only ten deaths in houses situated decidedly nearer to an- 
other street Broad St. pump.” [Tufte, 1997; Snow, 1855]. 

2. Decision making model - shut down a well d if the death toll of people 
who used this well is higher than that for people who used other wells: 

Vi (i i^d) D(d)>D(i) ^ Shutdown(d). 

The DMM is very simple and people often do not even notice that the model 
is there. However, this simple model is a result of very non-trivial discovery 
by Dr. Snow of the relation between use of well water and death toll [Tufte, 
1997, Snow, 1855]. 

Next we note that two categories of DDM models are necessary: 

(a) A model for the decision-maker (e.g., city managers or the board of 
guardians of the parish) and 

(b) A model for the analyst (e.g.. Dr. Snow) who discovers relations for 
a decision-maker. 

The model (a) for the decision maker can and should be simple, similar to 
the decision making model in (2.) above. The model (b) for the analyst must 
be complex enough to cover a wide range of possible decision alternatives. 
In the London example, the decision making model (2.) produced a single 
decision alternative - to shut down the pump/well d. A model of type (b) for 
the analyst might include many other alternatives to be explored: 

1 . Restrict the access of new people to the city, 

2. Restrict the contact between people in the city limit, 

3. Restrict the consumption of certain foods, 

4. Use certain medications, 

5. Restrict the contact of the population with certain animals, 

6. Restrict the consumption of some drinks. 
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Actually, the research of Dr. Snow resulted in the last alternative (spe- 
cifically to restrict/prohibit consumption of water from the well d). We have 
no historic evidence that Dr. Snow really considered all the alternatives (1) - 
(6). It is most likely that he came to the well water alternative without a for- 
malized decision making model such as the model (b). Our goal is to show 
that if his decision-making and visualization process had been driven by a 
DMM with alternatives (l)-(6) then the water alternative (6) would have sur- 
faced naturally and would have been investigated. This alternative can guide 
an investigation (including exploratory visualizations) instead of relying on 
insight of such extraordinary people as Dr. Snow. We illustrate the concept 
of model-based approach in Figure 9. 

This conceptual model has two components. The first component in- 
volves an analyst, who builds a DMM model and discovers some relation- 
ships. The second component involves a decision maker who works with 
relationships discovered by the analyst. This work is based on discovered 
and visualized relations and a DMM for actual decision. 




Figure 9. Conceptual Decision Making model structure and visualization 



4. DMM MODEL AND DISCOVERY OF 
RELATIONS 

In this section, we clarify connections between the deeision making 
model of type (b) with the alternatives (l)-(6) and discovering supporting Ri 
relations. The decision making model (b) is not formulated in terms of any 
specifie relation. It should be elaborated to inelude more objects and then 
investigate relations between them. For instanee, alternative (6) can be de- 
veloped to include such objects as water pumps, water distribution from the 
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pumps, methods of water treatment, and the type of population. These ob- 
jects may be suspected of being related to the high death toll. We call this 
structured information for the DM. Thus, the DM model will grow like a 
tree (see Figure 10). The rectangles show relations to be investigated. 

After providing such structural information, an analyst can investigate re- 
lations between death toll and each of the components: pumps, distribution 
routes, methods of water treatment and type of population. Currently this 
process is done by spatial data mining techniques (see Chapter 12 on SPIN 
system in this book). Visualization is a natural element in this analytical 
process. 




Figure 10. DM model with potential alternatives and relations 








(a) (b) 

Figure 11. Visual correlation in the process of discovery of relations 



After such investigation an analyst can conclude that water treatment 
(boiling, pump operations, and so on) are the most highly correlated objects 
to the death toll. It is visualized by marking these components (see filled 
ellipses and rectangles in Figure 11a). The next stage of investigation could 
discover a very important relation between pump and the death toll (as 
marked in Figure 12b). Conceptually such discovery means matching the 
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knowledge with likely states based on the context of the problem [Marsh, 
2000]. Sometimes this stage is called understanding. 

Now the analyst can report the discovery to decision-makers (city man- 
agers). If they want to be sure that the analyst did not overlook some impor- 
tant alternatives then graphs (a) and (b) from Figure 1 1 provide them this 
information. If the decision-makers just want to consider a course of action 
based on the discovered relation then only the simple path marked on Figure 
1 1 (b) is needed. The details of this path are presented on Figure 12. It shows 
the discovered pattern, its visual correlation with the decision (shut down the 
pump) and the visual correlation of the decision with the ultimate goal - 
decrease death toll. 




Figure 12. Final decision making model with discovered relation: visual correlation approach 

Conceptually this stage means interpreting the currently understood 
situation in terms of the desired end states and choosing the response that 
best meets the objective. Marsh [2000] suggested the term appreciation for 
this stage. 

Next we consider how visualization can help to discover the relation be- 
tween pumps and the death toll. This, of course, is the classical work of Dr. 
Snow [Tufte, 1997; Snow, 1855]. Figure 13 presents the idea of Snow's 
visualization that is common now in geospatial visualization and spatial data 
mining. The idea is to bring together a city map with pumps (circles) and 
death locations (squares). 

Figure 13 shows a higher death toll around one of the pumps. This visu- 
alization was critical for discovering the relation between the death toll and 
pumps, but once it has been discovered many other visualizations can serve 
as well as this one for convincing decision makers that the pump has to be 
shut down to save lives. 
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Figure 13. Mapping pumps and death toll 

Figure 14 shows a simple alternate visualization. To get the death toll in 
Figure 13 we need to use a specific area around each pump. Figure 14 shows 
this plot for distance 250 yards from pumps, other similar graphs can be 
drawn for distances of 500, 1000, and 1500 yards. Obviously, Figure 14 is 
simpler than the information presented in Figure 13 especially if all 11 
pumps studied by Dr. Snow along with all the city areas associated with 
these pumps were presented on one map. The map would contain a lot of 
information irrelevant to decision making on the cholera epidemic. 



Death toll 
I 500 




Pimp A Pump B Pump C PmpD Pump E Pump F area250yards 

Figurel4. Visual correlation 

Next, note that the visualization in Figure 14 is not new to decision mak- 
ers; they are familiar with this type of plot. This is a standard plot of the rela- 
tion used for checking correlation. Thus, decision makers can concentrate on 
making decisions instead of studying a new method for presenting data. 

We did not find evidence that, with respect to final decision making, the 
simple visual correlation (Figure 14) has any disadvantages in comparison 
with maps such as Figure 13 when a relation is already discovered. It seems 
that a simple visual correlation can serve very well. Moreover, it is possible 
that new developments of visual correlation methods for the final decision- 
making are not needed. 

Consistent use of known visual correlation methods and their combina- 
tions has an obvious advantage - decision makers know and trust them. In 
the next section, we review more specifically visualizations suitable for this 
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stage. We note, however, that the situation can be much different when vis- 
ual correlation is needed as a tool for discovery of unknown relations. Dis- 
covery of relations and their visual aspects is a major subject in such areas as 
computational intelligence, data mining, machine learning, and knowledge 
discovery (see for example [Kovalerchuk & Vityaev, 2000]). 



5. CONCEPTUAL DEFINITIONS 

Visualization and visual correlation can support decision-making tasks 
more efficiently if their role in this process would be clarified on the concep- 
tual level. To do this we need a conceptual model of decision making proc- 
ess itself Marsh [Marsh, 2000] provided a new conceptual model for the 
decision-making process (see FigurelS) and contrasted it with the traditional 
model. 




Figure 15. A new view of decision making (based on Marsh, 2000) 



Each model element can be implemented with some level of visual sup- 
port. Ideally, visualization of an element is derived from its role in the 
model. For instance, visualization of data, information, and knowledge is 
selected using the goal of the decision-making process as discussed above 
for the epidemic example. 
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The model suggested by [Marsh, 2000] operates with concepts that in- 
clude data, information, knowledge, understanding, perception, context, ex- 
perience, decision, and goal. Other concepts used in the model are apprecia- 
tion, priorities, a doctrine, and constraints. Constraints may include tactics, 
techniques, and procedures (TTPs). 

There are several and somewhat contradictory interpretations of these 
concepts in literature. Watson [Watson, 2002] defines data as properties of 
things and knowledge as a property of agents predisposing them to act in 
particular circumstances. Boisot [Boisot, 1998] defines information as a 
subset of the data residing in things that activates an agent - it is filtered 
from the data by the agent’s perceptual or conceptual apparatus. Marsh 
[Marsh, 2000] defines knowledge, understanding, and appreciation as fol- 
lows: 

• Knowledge is the matching of available information to known entities 
and behaviors in the real world. 

• Understanding is the matching of the knowledge with one or more 
likely states based on the context of the problem. 

• Appreciation is interpreting the currently understood situation in 
terms of the desired end states and choosing the response that best 
meets the objective. 

Thus, appreciation is defined as a higher form of reasoning that incorpo- 
rates both knowledge and understanding. Figures 9 and 10 adapted from 
[Marsh, 2000] show the central role of correlation in this view of the deci- 
sion-making process. It is consistent with our view on decision-making 
process as described above (see Figures 9-14). 

According to Figure 16 correlation is: 

a) a procedure matching information based on observations with multiple 
alternatives (hypotheses) regarding the current situation, or 

b) a procedure matching perception of the situation (based on knowledge 
and assumptions) with multiple alternatives (hypotheses) regarding the 
current situation. 

This concept of correlation is somewhat more general than the traditional 
correlation concept. The traditional concept often assumes that we correlate 
entities of the same modality, e.g., we may correlate stock market data for 
different days. In the correlation concept given above, entities are correlated 
with entities of potentially different nature: information is correlated with 
hypotheses about information; perception of situation is correlated with hy- 
potheses about the situation. In essence, entities are correlated with their 
possible explanations, which may have a very different structure and nature. 

There is also a significant difference between (a) and (b) in the level of 
human involvement. For example, in the case of Challenger catastrophe as 
noted in [Tufte, 1997], correlation between low temperature and a high fail- 
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ure rate was established. That is, correlation (a) was in place, but correlation 
(b) the perception of the situation was not highly correlated with the high 
failure rate. Thus, knowledge existed, but understanding of the situation did 
not. 




Figure 16. Building knowledge from infonnation and understanding from knowledge 
(based on [Marsh, 2000]) 

This definition fits well with the classical mathematical concept of corre- 
lation, but at first glance, it does not include correlation between observa- 
tions without any specific hypothesis (alternatives) explicitly formulated. A 
close look at the concept of correlation in fact assumes that there are some 
alternative hypotheses behind the scene. If somebody told us that A and B 
are correlated we would ask in what sense, i.e., we want to know what it 
means that A and B are correlated. The answer could be that pair (A, B) is 
correlated with the hypothesis of a linear relation between A and B, bi = hat, 
where B = {bt} and A = {a,}. 

There could be many other possible alternative hypotheses such as 
b=ka\ Thus, the commonly used expression that A correlates with B actu- 
ally is a simplification of more exact statement that pair (A, B) is correlated 
with hypothesis H, where H states the type of relationship between the data 
A and B. 
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In the cholera epidemic example above, correlation between Ad = “loca- 
tion of pump D” and Bd = “death rate at the location of pump D” discovered 
by Dr. Snow means that pair (A, B) correlates with the hypothesis H, where 
H is the relationship between A and B given by: “death rate at the location 
of pump D is high”. More formally this relation can be written as H = 
High(A, B) 

Alterative hypotheses might correlate high death rate with other pumps: 
He = “death rate at the location of pump E is high”, or Hq = “death rate at the 
location of pump Q is high” using available data, i.e., High(AE, Be) and 
High(AQ, Bq). This example also illustrates that when we correlate the avail- 
able information we may find that we know something, but we may not un- 
derstand it [Marsh, 2000]. 

In the example above, we know that location of pump D is correlated to 
a high death rate, but we do not understand why this is the case. Historically, 
such an understanding came much later when cholera bacteria were discov- 
ered. Yet for decision making to stop the cholera epidemic, we do not need 
that deep an understanding. We just need to understand how the correlation 
can be exploited to stop the epidemic. Thus, we need to interpret the dis- 
covered correlation in a practical way. Having in mind that understanding 
was defined as matching the knowledge with one or more likely states based 
on the context of the problem [Marsh, 2000], we need to match the discov- 
ered relationship for pump D with the current status of the pump D. We 
may find out that it is turned on. This will be our level of understanding of 
the situation. 




Figure 1 7. Visualization of epidemic decision making model 
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Applying Marsh’s approach, the next step is to appreciate the situation, 
i.e., to interpret the currently understood situation in terms of the desired 
states and choose the response that best meets the objective. 

Recognizing that shutting down pump D is a desired state would be the 
appreciation of the situation. Then the actual decision making could be to 
order pump D to shut down. This is illustrated in Figure 17, which combines 
Figures 11, 12 and 15. 



6. VISUALIZATION FOR BROWSING, 

SEARCHING, AND DECISION MAKING 

Visualization research and practice has shown that efficieney of visuali- 
zation depends on both type of data and type of task visualized. However, 
this common wisdom often is ignored and visualization selection is made 
using only data type. Below we analyze specific requirements for three most 
common tasks that need visualization support: browsing, searching and deci- 
sion making. At first, we clarify both common and significantly different 
features of these tasks. Common features can receive the same visualization 
support and features that are significantly different may need different visu- 
alization support. The main goal of visualization that supports browsing is 
to identify a type of relevant information more easily. That is, in browsing 
we wish to identify the search criteria. Thus, we distinguish browsing from 
searching in which the search criteria is already identified. In browsing the 
main goal is to get a elue, to formulate criteria about how actual relevant 
information may look. 

In searching we look for actual relevant information, knowing the search 
criteria from the browsing stage. For instanee, by browsing variety of 
houses a user can identify that having a lake nearby is more attractive than a 
nearby stadium. Searching focuses on finding, in a specific area, actual 
houses for sale with lakes nearby. Browsing could be done in the Boston 
area to identify the criteria “lake nearby”. An actual search might then be 
done in the Seattle area where a person wants to buy a house. Finally, in this 
example the goal of visualization to support decision making is to make gen- 
erating a decision easier. 

From the visualization viewpoint, browsing and decision-making have 
similar requirements: support for observing several entities simultaneously 
to determine a preference. At first glance, it means that both tasks need the 
same visualization support. However, in the example above, even when the 
entities involved in the comparison are the same (houses and neighbor- 
hoods), the outputs are different. 
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In browsing, it is the features of houses and neighborhoods that are im- 
portant, while in decision making it is a house as a whole. Thus, the process 
of browsing needs the support of making all the features of two or more 
houses visible simultaneously. In the extreme case, only features would be 
visualized for such browsing without showing the house as a whole. 

For the decision-making task, searching provides houses with the needed 
features. This requires seeing all the features together. It is already known 
that searching provides these features. We actually need to see differences 
between features that do not explicitly belong to our list of search criteria. 
Alternatively, we may be interested in looking at other features (not used as 
search criteria) which will make the difference in choosing between the 
found houses. 

If we view searching as a procedure with well-defined search criteria, 
what we are looking for as a visualization tool for searching can be quite 
different from visualization tools for browsing and decision making. It 
should support a query design (visual query) which speeds up search if the 
search is not fully automatic. Visualization of the search space can make 
searching interactive and faster. 



7. TASK-DRIVEN APPROACH TO 
VISUALIZATION 

Years of psychological studies had shown that graphics can expedite 
task-specific information processing [Ullman, 1964; Larkin & Simon, 1987; 
Casner, 1991]. For computational tasks, it was noticed that quick percep- 
tual inferences such as determination of distance, size, spatial coincidence, 
and color comparisons judgments are much easier and faster than logical 
inferences such as mental arithmetic or numerical comparisons. 

For the search tasks, it was also noticed that grouping related informa- 
tion in a single spatial locality, and encoding it by coloring, shading, and 
spatial arrangement efficiently supports preattentive and parallel visual 
search. 

The BOZ system was one of the first visualization systems based on ex- 
plicit task analyses [Casner, 1991]. It substitutes simple perceptual infer- 
ences in place of more demanding logical inferences, and streamlines infor- 
mation search. This system analyzes the logical description of a task and 
designs a provably equivalent perceptual task. Next it produces a graphic to 
support perceptual inference and minimization of visual search along with 
a perceptual procedure describing how to use the graphic to complete the 
task. BOZ generates different presentations of the same information custom- 
ized to the requirements of different tasks. Experiments have shown that this 
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significantly shortened users’ performance time in five different airline res- 
ervation tasks. 

Casner [Casner, 1991] provides an extensive argumentation for task- 
driven visualization; 

Generalizations made about the observed usefulness of a graphic for one 
task are highly inappropriate since using the same graphic for different 
tasks often causes the usefulness of the graphic to disappear. Graphic de- 
sign principles that do not take into account the nature of the task to be 
supported (e.g., “line graphs are best for continuous data”) are too under- 
specified to be useful in general. . . . 

Casner also refers to similar previous judgments of [Jarvenpaa & Dickson, 
1988] and concludes that effective graphic design should begin with a task 
analysis and be focused on finding the parts that might be performed more 
efficiently using a graphic. 

To succeed, the task-driven approach needs a mechanism for matching 
the task (that is not originally in a graphic form) with a graphic. If the se- 
mantics of a task are expressed in propositional formalism then we need a 
graphic represented as sentences in a formal graphical language that has the 
same matched, precise syntax and semantics as the propositional formalism. 
Mackinlay [Mackinlay, 1986] designed such a system, APT, for the specific 
task of providing 2D presentations of relational information. Intensive stud- 
ies of this subject are presented in [Mackinlay & Genesereth, 1985; 
Mackinlay, 1986; Mackinlay, 1988; Card & Mackinlay, 1997]. 

Another idea is to select a presentation format (from a set of predefined 
formats) that best matches the characteristics of the task. This approach was 
popular in 1980s in systems such as the following: AIPS [Zdybel, 

Greenfield, Yonke, & Gibbons, 1981], BHARAT [Gnanamgari, 1981], 
VIEW [Friedell, 1982], and APEX [Feiner, 1985]. 

The task-driven approach continues to be a focus of visualization re- 
search for the last ten years [Feijs & de Jong, 2000; Kerpedjiev & Roth, 
2000; Kerpedjiev, Carenini, Roth, & Moore, 1997; Shaffer, Reed, Whitmore 
& Shaffer, 1999; Zhou, 1999; Beshers & Feiner, 1993]. In general, the task- 
driven approach (also called the task-analytic approach, the mission- 
specific approach, and task-specific appraoch) to the design of graphics in 
which graphic presentations are viewed as perceptually manipulated data 
structures helps to improve task performance. 

The discussion above allows the formulation of general steps of the task- 
driven approach: (1) analyzing a user task, (2) generating equivalent percep- 
tual tasks that can be performed more efficiently, and (3) designing accom- 
panying graphics to support efficiency of the perceptual task. Accomplishing 
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steps (l)-(3) is a significant research challenge even for relatively simple 
tasks. 



8. CONCLUSION 

In this chapter, we have discussed problems of decision support by using 
discovered and visualized patterns. Visualization helps domain experts dis- 
cover hidden patterns and correlations by directly augmenting computational 
intelligence methods. For domain experts (analysts and decision makers) 
discovery and visualization of hidden patterns is important but only as a part 
of their decision making process. We provided a conceptual view of the de- 
cision-making process, a stmctural model and relevant visual aspects of this 
process. A structural model of the decision-making process has been linked 
with the visual discovery of its components and visual reasoning with those 
components. This consideration has been illustrated with examples from 
USS Cole incident in 2000 and cholera epidemic in London in 1854. 

A task-specific visualization approach that is a part of the conceptual 
view has been illustrated by providing conceptual differences between visual 
means that are needed to support three different tasks such as decision mak- 
ing, browsing, and searching. 
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10. EXERCISES AND PROBLEMS 

1. Find a visualization of the NASA Columbia disaster in 2003 in the mass 
media. Build an iconic structure (summary) of that visualization similar 
to presented in Figure 8 and identify the level of the visualization (illus- 
tration, reasoning, decision making) for the iconic summary and the 
original mass media image. If illustration, reasoning, and decision making 
levels do not fit, formulate your own level category. 
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2. This assignment differs from #1 only in one aspect: the visualization 
should be taken not from Mass Media outlets but from NASA sources in- 
cluding an official report on Columbia disaster. 

3. Compare the levels of the visualizations in assignments 1 and 2 from a 
decision making perspective. Discuss differences and similarities. Build 
your own visualization for Columbia disaster that intends to meet deci- 
sion making intent. Provide justification for your design based on analy- 
sis of deficiencies of visualizations used in assignments #1 and #2. 

Tip: Start from adapting Figures 10 - 12 to this assignment. 

4. Build a conceptual description of your design for assignment #3. 

Tip: Start from adapting Figure 9. 

5. Find or design visualization that fits a searching task. Justify your solu- 
tion. 

6. Find or design visualization that fits a browsing task. Justify your solu- 
tion. 

7. Find or design visualization that fits a decision-making task. Justify your 
solution. 

8. Describe general steps of a task-driven visualization approach. Give one 
example of such visualization from literature and design one example on 
your own. Identify the general task-driven steps in both examples. 

Advanced 

9. Elaborate conceptually the general steps of a task-driven visualization 
approach: (1) analysis of a user task, (2) generation of equivalent percep- 
tual tasks that can be performed more efficiently, and (3) design of ac- 
companying graphics to support the efficiency of the perceptual task. Tip: 
Your elaboration should decompose these three steps to smaller substeps. 
Be specific and provide examples for the substeps. 
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Abstract: Visual decision-making involves a problem, users, and a software distribution 

model. We describe the infonnation visualization value stack and identify a 
framework that defines problems where it creates significant value. We also 
describe successful models that support visualization deployment and charac- 
terize different types of visualization users. 



Key words: Value model, applications, sweet spot. 



1. THE INFORMATION VISUALIZATION VALUE 
STACK PROBLEM 

Over the last decade information visualization has emerged as an exciting 
research area that is addressing a significant problem: how to make sense of 
the ever increasing amounts of information that has become widely avail- 
able. With the growth of networking and decreasing cost of storage it has 
become technically feasible and cost effective to store and access vast sets of 
information. The academic, business, and government challenge is how to 
make sense of this information and translate the insights into value- 
producing activities. 

As a new emerging field there will certainly be opportunities for informa- 
tion visualization technology. There have already been some early successes 
and also some failures. Unfortunately, not all information visualizations cre- 
ate enough value so that users will switch over from conventional users in- 
terfaces to adopt new visual interfaces. This paper presents a simple frame- 
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work that predicts problem areas where information visualization will 
achieve utilization — that is being useful enough so that users will adapt new 
visual interfaces. 

Information visualizations are exciting and the demos inevitably generate 
interest among potential users. Unfortunately, however, visualization, as ex- 
citing as it is, only involves the user interface or presentation layer in a tech- 
nology stack. Useful information applications solve problems that involve 
collecting data, manipulating it, organizing it, performing calculations, and 
finally presenting the results to users. The value of the application is cap- 
tured by the complete system. It is often the case that each system compo- 
nent individually is not particularly useful. For example, tires are not useful 
without a car, but better tires improve a car's performance. The presentation 
layer, like beauty, is only “skin deep” and the usefulness of the application 
comes from the whole solution and not just the “lipstick.” 

Thus, by itself, information visualization is naturally a feature of system 
and rarely is a complete application by itself This, unfortunately, makes 
utilization difficult. With a few exceptions, the technology must be part of 
an application to capture sustainable value. Information visualization “makes 
it better” but does not make it. The information visualization value stack 
challenge is to find applications where information visualization creates 
enough value, either by itself or as part of an application, to support utiliza- 
tion. 



2. WHERE DOES INFORMATION CREATE HIGH 
VALUE? 

At its most basic level, information visualization is a technique for help- 
ing analysts understand information. This section describes four classes of 
information problems, illustrated by examples, where visualizations create 
value. 

2.1 User interface is the application 

In certain cases, the user interface is essentially a complete application. 
The canonical example of this is computer games which are innovative and 
sophisticated user interfaces that involve, relatively speaking, little computa- 
tion and no data integration. Successful games must have a great user inter- 
face that challenges and engages prospective players within the first few 
seconds. 



2. Information visualization value stack model 
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Visual Presentation and Branding involves creating custom, 3D dis- 
plays of information for presentations that are visually exciting. It fre- 
quently incorporates aspects of branding and has a high glitz and wow fac- 
tor. Typical presentation and branding techniques include animations and 
colorful 3D displays. 

Figure 1 shows two examples of information visualizations for presenta- 
tion and branding. The visualization on the left shows activity on the 
NASDAQ stock exchange and the visualization on the right shows website 
activity. 




Figure 2. Information visualizations for presentation and branding. Left NASDAQ display 
and Right: Visual Insights’ eBizLive product for showing website activity. See also color 

plates. 



Executive Dashboards provide decision-makers with instant access to 
key metrics that are relevant for particular tasks. Much of the intellectual 
content in dashboards is in the choices of metrics, organization of informa- 
tion on the screen, and access to supporting, more detailed information. In- 
formation visualization techniques improve this presentation. Executive 
dashboards may include the ability to export result-sets to other tools for 
deeper analysis. 

State-of-the-art implementations of active executive dashboards are web- 
based, interactive, and dynamic, involve no client-side software to install, 
and often include action alerts that fire when pre-defmed events occur. End 
user customizations include sorting, subsetting, rearranging layouts on the 
screen, and the ability to include or exclude various metrics. It is common 
for visual reports to be distributed via email, published on a corporate intra- 
net, or distributed through the internet. 

Real-time Visual Reports are related to executive dashboards but pro- 
vide an active presentation of an information set consumable at a glance. 
Although the distinction is subtle, visual reports usually involve fat client- 
side software and thus can provide richer presentations of the information. 
Visual Reports exploit the idea that a picture is worth a thousand words and. 
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in particular, for many tasks a picture is more useful than a large table of 
numbers. 

Visual reporting systems are: 

1. Easy to use for both sophisticated and non-sophisticated user com- 
munities, 

2. Suitable for broad deployments, and 

3. Provide capabilities for flexible customization. 



rssiT 




Figure 2. Executive Dashboard courtesy of Bill Wright. See also color plates. 



Visual reports, as with all reports, are a tool for assumptive-based analy- 
sis. Reports answer “point questions”: How much of a particular item is in 
stock? Where is it? How long will it take to get more? Reports are ideal for 
operational tasks, but do not provide full analytics, or enable an analyst to 
automatically discover new information that a user has not thought to ask 
about. 

This is a well-known characteristic of all report-based analytical solu- 
tions. The reports pre-assume relationships that are reported upon. The diffi- 
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culty with this approach is that most environments are too complex for a pre- 
defined report or query to be exactly right. The important issues will un- 
doubtedly be slightly, but significantly different. This is particularly true for 
complex, turbulent, environments where the future is uncertain. There are 
two common solutions to this problem. The first is to create literally hun- 
dreds of reports that are distributed out to an organization, either using a 
push distribution mechanism such as email or a pull mechanism involving a 
web-based interface. The second involves adding a rich customization ca- 
pability to the reporting interface that increase UI complexity. Unfortu- 
nately, neither works particularly well. Although a report containing novel 
information might exist, finding it is like finding a needle in a haystack. 
Adding UI features makes the reporting system difficult to use for non- 
specialists. 




Figure J. Real-time 3D Visual Report courtesy of Visual Insights. 
See also color plates. 



2.2 Information discovery applications for deep analysis 

Visual discovery-based analysis addresses the shortcomings assumptive- 
based analytics by providing a rich environment to support novel discovery. 
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Systems supporting visual discovery are used by analysts and frequently 
combine data mining, aspects of statistics, and also predictive analytics. 
Visual discovery is domain specific and iterative. Information visualization 
improves visual discovery by enabling discoveries to often “jump” out and 
may lead to “why” questions. For example, in a supply chain management 
analysis, visual discovery might identify an unusual inventory condition that 
would lead to a subsequent investigation into why it occurred and how to fix 
it. 

AD VIZOR [Codd, 1977] is an example of system for visual discovery. It 
consisted of a workspace with standard data acquisition capabilities, and a 
set of visual metaphors, e.g. views, each of which showed data in a particular 
way. Some of the views were conventional (e.g., barchart, linechart, 
piechart) and some were novel (Data Constellations, Multiscape, Data 
Sheet). For visual analysis, the views could be combined into fixed arrange- 
ments called perspectives. 

Within any perspective the views could be linked in four ways: by color, 
focus, selection and exclusion. Components linked by color used common 
color scales and those linked by focus, selection and exclusion were tied by 
data table row state using a case-based model [Eick, 2000a]. 




Figure 4. Advizor 2000 Visual discovery and analysis tool. 
See also color plates. 
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AD VIZOR contained three interesting ideas: 

1 . Perspectives extend general linked view analysis systems by reduc- 
ing complexity for non-expert users. Perspectives are “authored” 
by “power users” who are ADVIZOR experts. Analysts, who are 
domain experts, but not power users, use the perspectives as a start- 
ing point for analysis and as a guiding framework. The output from 
their analysis, visual reports, may be published and distributed for 
use by casual users, executives, and decision-makers. The AD- 
VIZOR user model is similar to that employed by spreadsheets 
where there are spreadsheet authors, users, and consumers. 

2. Visual Design Patterns are recurring patterns within perspectives 
that are broadly useful and apply to many similar problems. Follow- 
ing the object-oriented programming community [Eick, 2000b], rec- 
ognizing, cataloging, and reusing design patterns have the potential 
for significantly improving information visualizations. 

Examples of design patterns are Shneiderman's Information- 
seeking Mantra: Overview Zoom, Filter, Details on Demand [Card, 
1999]. The overview shows the entire dataset, e.g. all movies in the 
dataset, and supports the ability to zoom in on interesting movies 
and query the display with the mouse to extract additional details. 
This design pattern incorporates interactive filters, frequently bar 
and pie charts that enable you to filter out uninteresting folders so 
that you display only the data that is interesting. Filtering might be 
by category, numeric range, or even selected value. 

Another design pattern, called Linked Bar Charts, is particularly 
strong for data tables containing categorical data. Categorical data, 
sometimes called contingency tables, involves counts the number of 
data items organized in various bins or subcategories. This design 
pattern employs one bar plot for each categorical column with the 
height of the bar tied to the number of rows having that particular 
value. In statistical terms each of the bar charts shows a marginal 
distribution. As the user selects an individual bar, the display recal- 
culates to show one-way interactions. Using exclusion and selection 
shows two-way interactions. 

3. Visual Scalability, the third interesting idea, involved the realiza- 
tion that each visual metaphor has an inherent scalability limitation 
[Eick, 2003] - a large enough dataset will overwhelm any visual 
technique. Scalability is determined by controllable factors such as 
visualization technique, monitor resolution, computing environment. 
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data structures, algorithms, and uncontrollable factors such as human 
perception. 

The important and exciting observation is that there is a set of 
tools and techniques that can be used to “ratchet up” the scalability 
of any visual component. These include interactivity, panning and 
zooming techniques, identification and selection, focus and context, 
multi-resolution visual metaphors, automatic aggregation, and im- 
proved labeling algorithms. When combined, the tools can give 
rather stunning increases in visual scalability, e.g. two to three or- 
ders of magnitude. See Figure 5. 




Figure 5. Bar chart scalability is increased by using levels of rendering detail and a red over- 
plotting indicator at the top of the view. Scalability in this case facilitates locating and then 
focusing attention on particular bars. 

See also color plates. 



2.3 Information visualizations for searching and 
exploration 

Information visualizations focused on visual searching involves undi- 
rected knowledge discovery against massive quantities of uncategorized, 
heterogeneous data with varying complexity. This scenario is typical of web 
searching where users recognize information when they find it. Searches are 
iterative, intuitive, and involve successive refinements. 

The key measures for the performance visual searching systems revolve 
around the amount of information per unit of search effort expended. The 
search effort may be measured in user time, number searches, personal en- 
ergy, etc. The results, or information found, may be measured in articles. 
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references, relevance, novelty, ease of understanding, etc. Different systems 
exploit various design points trading off these factors. 

2.4 Task-specific visualizations 

Task-specific visualizations help users solve critical, high-value tasks. 
Examples include visualizations to: 

1. Execute on-line equity trades (Figure 6), 

2. Manage complex communications networks, and 

3. Operate nuclear power plants. 

These visualizations are tuned to particular problems often delivered as 
part of a complex system. They are highly valuable, frequently involve fus- 
ing of a large number of information streams, and serve both as an output 
presentation for information display and also control panel and input inter- 
face for user operations. 




Figure 6. Information visualization for on-line stock trading 
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3. INFORMATION VISUALIZATION SWEET SPOT 
MODEL 

Information Visualization problems can be defined by three dimensions:' 

1 . Dataset size is a measure of the total amount of data to be analyzed. 
Although some might disagree, information visualization tech- 
niques are not needed for small datasets containing tens to hundreds 
or perhaps even a few thousand observations. In these cases re- 
ports, spreadsheet graphics, and standard techniques work fine. 
More powerful techniques are unneeded. 

Conversely, information visualization techniques do not scale to 
analyze massive datasets containing gigabytes of information. The 
basic problem is that information visualization is technique that 
makes human analysts more efficient and human scalability is quite 
limited. The exact scalability limits of information visualization are 
subject to debate and are an active research area [Crow, Lantrip, 
Pennock, Pottier, Schur, Thomas, et ah, 1995; Tick, 2003]. Most 
researchers would agree, however, that massive datasets containing 
hundreds of thousands to millions of observations are too big and 
need to be subdivided, aggregated, or in some way reduced before 
the information can be presented visually. Information visualiza- 
tion, it would seem, cannot be applied to analyze massive image da- 
tabases containing millions of images, but might be applied to meta 
data associated with the images. 

2. Dataset complexity can be measured by the number of dimensions, 
stmcture, or richness of the data. Information visualizations are not 
needed for (even large) simple datasets with low dimensional com- 
plexity. Statistical reduction tools such as regression work fine and 
are sufficient in this situation. 

Conversely, datasets of massive complexity containing thou- 
sands of dimensions are too complex for humans and thus for infor- 
mation visualizations. Some have argued that information visualiza- 
tions can cope with as many as fifty dimensions, although a more 
practical upper limit is say half to a dozen dimensions. 

3. Dataset change rate is a measure of how frequently the underlying 
problem changes. Static problems, even for very complex problems, 
can eventually be solved by developing an algorithmic solution. The 
algorithmic solution has a huge advantage over an information visu- 

'Xhe original version of this idea is due to Doug Cogswell. 
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alization-based solution since the algorithm can be applied repeat- 
edly without the need for expensive human analysts. 

Conversely, analysis problems involving change or other dy- 
namic characteristics are extremely difficult to automate because the 
problem keeps moving. In these cases human insight is essential. 
Humans, however, cannot cope with problems that change too 
quickly. We are incapable of instantaneous responses. Human ana- 
lytical problem solving occurs on a time scale of minutes to months. 
We must automate problems needing faster response and partition 
problem those involving longer time scales. 



Table 1. Attributes with low and high values 



Attribute 


Low Value 


High Value 


Dataset size 


10^-10^ 


105tol06 


Dataset complexity 


2 or 3 dimensions 


50 dimensions 


Dataset change rate 


minutes 


Months 



As shown in Table 1 the sweet spot for information visualization in- 
volves analysis problems of moderate data sizes, rich, but not overwhelming, 
dimensional structure, that change, are not easily automated, or for some 
reason need human involvement. Examples of prototypical applications in- 
clude the following. 

Network Management for complex networks where the system is dy- 
namic, constantly changing with new protocols, new devices, and new appli- 
cations. The systems are instrumented and collect alarms with complex di- 
mensional structure. It is frequently the case that the number of events 
(alarms) exceeds the capacity of network visualizations and must be algo- 
rithmically reduced. 

Customer Behavior involving human buying patterns and transaction 
analysis is an ideal candidate for information visualizations. Human behav- 
ior is complex, unpredictable, and dynamic. Furthermore, although aggre- 
gate numbers of transactions are large, for any individual or set of individu- 
als the numbers of transactions are not overwhelming and easily suitable for 
analysis. 

Intelligence Analysis is an ideal candidate for information visualization. 
It is difficult to automate, involves complex dimensional data, is dynamic, 
and necessarily involves human analysts. 
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4. SUCCESSFUL DEPLOYMENT MODELS FOR 
INFORMATION VISUALIZATIONS 

Successfully deploying information visualizations involves solving a 
technical problem and creating a business model that supports widespread 
utilization. Broadly speaking, there are three classes of business models for 
software companies. 

Custom software is written to solve a specific problem, usually for a 
single customer. The problem being addressed must be significant, valuable, 
important, and yet specialized enough so that general solutions do not exist. 
The projects often involve next generation technology and new approaches 
to problems. 

Typical price points for custom software projects usually start at $250K. 
Custom software is sold directly by the vendor with six months to two-year 
sales cycle. The sales team is highly specialized and the sales process fre- 
quently involves company executives. 

Organizations involved with customer software include universities, gov- 
ernment labs, large commercial organizations, and boutique specialty shops. 
Although it might seem surprising to some, research universities and gov- 
ernment labs are custom software developers where the funding agencies 
effectively hire university principal investigators using BAAs and solicita- 
tions to solve important custom problems. In this setting the principal inves- 
tigators function as both sales professionals and also lead fulfillment efforts 
with “graduate student” development teams. 

In the large organizations that sponsor customer software development 
there are commonly multiple roles. It is often the case particularly with gov- 
ernment-sponsored projects that the funding organization is not the organiza- 
tion that will eventually use the software and the users of the software may 
not receive the value from its use. These separate organizational roles com- 
plicate the software sales process. For example, the National Science Foun- 
dation funds research to build software for scientists to use. Scientists use 
the software to solve important national problems. Thus citizens are the ul- 
timate beneficiary. In the commercial environment the CFO (Chief Finan- 
cial Officer) funds a project that is implement by the CIO (Chief Information 
Officer) for a business unit. Thus three organizations are involved. 

Enterprise software, sold by commercial companies, is essentially a 
flexible template that is “implemented” on site, either by the vendor or a 
“business partner.” In the implementation phase the template is customized 
for a particular customer by connecting up data sources, define the specific 
reports a company needs, populating tables, e.g. inserting employee names 
into a payroll file, etc. For an enterprise application, data integration is 
essential. 
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Since enterprise software is reusable, it can be sold more economically 
than custom software. Generally, price points for enterprise software range 
from $25 to $250K. The sales model for enterprise software may be direct at 
the higher price points, e.g. SAP, or through local business partners who are 
“certified” by the vendor. 

Shrink-wrap software is highly functional software that solves a spe- 
cific problem very well. The software usually is customer installed and pro- 
vides for little or no customization. Customer support, if provided, is usually 
self-serve via a web site or perhaps with limited help desk support. 

Shrink-wrap software is almost always sold by distributors^ or OEMed to 
the hardware vendors and sold as part of a bundle. As a mass-market item, 
the price point for shrink-wrap software is less than $25K and more fre- 
quently less than $1K. 

Relating the business models back to the visualization deployments, most 
of the demand for information visualization has been met with custom re- 
search software built by universities, government labs, and large communi- 
cations companies. The customers are the military, intelligence community, 
biomedical researchers, and other highly specialized users. Demand for in- 
formation visualization within the research community is healthy. 

Within the enterprise category we might expect information visualiza- 
tion-enabled applications to emerge. In this category, the value is provided 
by the whole application and an information visualization presentation layer 
could be described as a software feature or add-on product. Within this cate- 
gory there have been some early successes. Perhaps the most notable success 
has been Cognos Visualizer. Cognos sold 300K^ units of Cognos Visualizer, 
an add-on for Cognos PowerPlay, at $695 per unit and some of the other 
“Business Intelligence” software vendors have had similar experiences. 

The “Gorilla” analytic application within shrink-wrap category for visu- 
alization is Microsoft Excel. It is generally considered to be good enough for 
90% of problems and essentially everybody has it. 



5. USERS OF VISUALIZATION SOFTWARE 

There are three broad classes of potential information visualization users: 
Scientists, Analysts including both intelligence and commercial analysts, and 
Business User. 



^Microsoft, the largest producer of shrink-wrap software, sells essentially all of its software 
through distributors. 

^For comparison, a software application that sold 2,000 to 5,000 units would generally be 
considered successful. 



44 



Chapter 2 



Scientists have deep needs for information visualization, are extremely 
technical, and work on the most significant problems. They want powerful 
tools for cutting-edge analyses. 

Analysts, particularly in commercial companies, also have a strong need 
for information visualization, but tend to have specialized needs. They are 
not as sophisticated as scientists are and will not tolerate raw software pack- 
ages. 

Business users need simple information visualizations and are easily 
frustrated by complex software. Business users are numerous, have budget, 
but need solutions to problems and are not inherently interested in the com- 
plexly that excites scientists and analysts. 

These three classes of users have different needs and varying tolerances 
for complex software. A research challenge is to create software that is so- 
phisticated enough to solve scientific problems and yet easy to use for busi- 
ness users. 



6. CONCLUSION 

Information visualization involves the presentation layer with is naturally 
a feature of many products. By itself, it usually has insufficient value to sup- 
port widespread usage and deployment. 

It is generally a feature of an application and a critical component of a so- 
lution. This paper describes several of these cases, illustrates them with ex- 
amples, and defines a simple model for information visualization's sweet 
spot. 
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8. EXERCISES AND PROBLEMS 



1 . Identify other factors that might be included in the information visualiza- 
tion value stack model. 

Advanced 

2. Extend information visualization value stack model to ASP-based appli- 
cations. 



2. Information visualization value stack model 

3. Formalize information visualization value stack model. 
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Abstract: Reasoning plays a critical role in decision making and problem solving. This 

chapter provides a comparative analysis of visual and verbal (sentential) rea- 
soning approaches and their combination called heterogeneous reasoning. It is 
augmented with a description of application domains of visual reasoning. 
Specifics of iconic, diagrammatic, heterogeneous, graph-based, and geometric 
reasoning approaches are described. Next, explanatory (abductive) and deduc- 
tive reasoning are identified and their relations with visual reasoning are ex- 
plored. The rest of the chapter presents a summary of human and model-based 
reasoning with images and text. Issues considered include: cognitive opera- 
tions, difference between human visual and spatial reasoning, and image rep- 
resentation. One of the main our statements in this chapter is that the funda- 
mental iconic reasoning approach proclaimed by Charles Peirce is the most 
comprehensive heterogeneous reasoning approach. 

Key words: Visual reasoning, spatial reasoning, heterogeneous reasoning, iconic reason- 

ing, explanatory reasoning, geometric reasoning. 



The words or the language, as they are written or spoken, 
do not seem to play any role in my mechanism of thought. 

Albert Einstein 



1. VISUAL VS. VERBAL REASONING 

Scientists such as Bohr, Boltzmann, Einstein, Faraday, Feynman, 
Heisenberg, Helmholtz, Herschel, Kekule, Maxwell, Poincare, Tesla, Wat- 
son, and Watt have declared the fundamental role that images played in their 
most creative thinking [Thagard & Cameron, 1997; Hadamard, 1954; 
Shepard & Cooper, 1982], 
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Problem solving and decision making is based on reasoning, where the 
result of such reasoning is a solution or decision. Herbert Simon [Simon, 
1995] pointed out that Aristotelian logic and Euclidean geometry were major 
and abiding contributions of the Greeks to reasoning in language (natural or 
formal) and drawing inferences from diagrams and other pictorial sources to 
solve problems of logic and geometry. Note that despite the frequent refer- 
ences to Greek mathematics as an origin of visual reasoning, the Chinese and 
Indians knew a visual proof of the Pythagorean Theorem in 600 B.C. before 
it was known to the Greeks. This visual proof is shown in Figure 1 [Kulpa, 
1994]. 




Figure 1. Pythagorean Theorem known to the Chinese and Indians [Kulpa, 1994] 

It is widely acknowledged that visual diagrammatic, iconic representa- 
tions of reasoning are better understood than verbal explanations because 
diagrams and icons are symbols that better resemble what they represent 
than text [Thagard & Cameron, 1997]. 

Simon [Simon, 1995] discusses the heuristic nature of human reasoning 
in problem solving as an argument for using non-verbal visual reasoning 
using diagrams as a tool to foster reasoning and to find answers. According 
to Simon, traditional reasoning in non-visual formal logic is much more 
helpful in testing the eorreetness of the reasoning than in identifying the 
statement to be inferred (finding a solution). 

Another direction to consider for a visual approach is to foster problem 
finding in the space of alternative problems vs. problem solving that we dis- 
cussed above. Einstein and Infeld [Einstein & Infeld, 1938] stated: 

The formulation of a problem is often more essential than its solution, 
which may be merely a matter of mathematical or experimental skill. To 
raise new questions, new possibilities, to regard old problems from a new 
angle, requires creative imagination and marks real advance in science. 

It is well known this process is extremely informal not only in scientific 
discovery but in design (architectural and others). Several attempts have 



3. Visual Reasoning and Representation 



51 



been made to provide a comprehensive stmctural picture of this informal 
visual process. Hepting [Hepting, 1999] formulated this picture as a collec- 
tion of 16 design principles. We structured them into two processing catego- 
ries: 

I. Support for a model of problem solving by: (1) supporting restruc- 
turing and reorganizing of alternatives, (2) encouraging the user to 
discover things personally, (3) supporting the systematic exploration 
of a conceptual space, (4) allowing the user to concentrate on the 
task, (5) providing an external representation of the possible choices, 
and (6) allowing each user to apply personal judgment. 

II. Technical tools to support problem solving by: (1) supporting the 
user in working directly with images, (2) supporting the user in com- 
bining images, and their elements, (3) allowing the user to employ 
heuristic search techniques, (4) recording all aspects of the visual 
representations of the design, (5) supporting collaboration, (6) assist- 
ing the user in choosing images, (7) allowing the user to interact with 
the images, (8) supporting multiple visual representations, and (9) 
supporting the use of current solutions in future enquiries. 

Once again, this list shows that the process is very informal (and proba- 
bly sometimes illogical). It is also illustrates the huge role of visual reason- 
ing in all stages of the design process. 

Another argument for visual reasoning is that more than one medinm 
provides information for reasoning that includes text and pictures, but formal 
logic is limited by sentences [Shin & Lemon, 2003]. This argument is used 
for supporting both pure visual reasoning and for heterogeneous reasoning 
that combines text, pictures, and potentially any other medium [Barwise & 
Etchemendy, 1995]. 

The next argument for visual and heterogeneous reasoning is related to 
the speed and complexity of reasoning. Reasoning with diagrams and with- 
out re-expressing the diagrams in the form of a sentence can be simpler 
and faster. It avoids an unnecessary and non-trivial information conversion 
process, by working directly with heterogeneous rules of inferenee, e.g.. 
First Order Logic and Euler/Venn reasoning [Swoboda & Allwein, 2002; 
Swoboda & Barwise, 2002]. 



2. ICONIC REASONING 



One of the founders of the modern formal logic Charles Peirce (1839- 
1914) argued for use of visual inference structures and processes a long time 



52 



Chapter 3 



ago. Recently it has become increasingly clear that one of the fundamental 
difficulties of automatic computer reasoning is that it is extremely hard to 
incorporate the human observational and iconic part of the reasoning proc- 
ess into the computer programs. Indeed, the extreme opinion is that it is sim- 
ply impossible [Tiercelin, 1995]. 

Peirce stated that in order for symbols to convey any information, indices 
and icons must accompany them [Peirce, 1976; Hartshorne, Weiss & Burks, 
1958; Robin, 1967; Tiercelin, 1995]. It is probably not accidental that in ad- 
dition to being a logician and a mathematician, Peirce was also a land sur- 
veyor at the American Geodesic Coast Survey. In this capacity, he would 
have first hand experience with real world visual and spatial geographic rea- 
soning. We should note here that several chapters in this book deal with vis- 
ual and spatial reasoning and problem solving related to combining geo- 
graphic maps, aerial and satellite photos. Charles Peirce distinguished the 
iconic, indcxical and symbolic fnnctions of signs. Table 1, based on 
[Tiercelin, 1995], summarizes Peirce’s view of icons and diagrams. 

Table 1. Peirce’s concepts of icons and diagrams 
Icons Icon: an object that may be purely fiction, but must be logically possible 

Main icon function: exemplification or exhibition of an object (its characteris- 
tics) 

Secondary icon function: resemblance to the object 

By direct observation of an icon, other truths concerning its object can be dis- 
covered. 

The ideal iconic experimentation warrants an accord between the model and the 
original. 

Icons are formal and not merely empirical images of the things. 

Icons represent the formal side of things. 

Diagrams species of icons 

According to Peirce, in general, mathematical reasoning and deduction 
involve appeal to the observation of “iconic” representations and it cannot 
be reduced to purely symbolic (i.e. rule-governed) transformations [Peirce, 
1933; Ransdell, 1998]. In modern studies, the term “diagrammatic” princi- 
pally replaced the Peirce’s term “iconic”. Peirce stated that all thinking is in 
signs, and that signs can be icons, indices, or symbols [Thagard & Cameron, 
1997; Goudge, 1950; Hartshorne & Weiss, 1958]. We believe that Peirce’s 
original term “iconic” has an important flavor that has faded in the modern, 
more “scientific” term “diagrammatic”. We discuss iconic representations 
further in Chapters 9 and 10. 

Diagrammatic reasoning is an active area of research with a variety of 
subjects being considered [Shin & Lemon, 2003; Hegarty, Meyer & Naraya- 
nan, 2002; Chandrasekaran, Josephson, Banerjee, Kurup & Winkler, 2002; 
Chandrasekaran, 2002; Chandrasekaran, 1997; Anderson, Meyer & Ovier, 
2002; Anderson, 1999; Magnani, Nersessian & Pizzi, 2002; Glasgow, 
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Chandrasekaran & Narayanan, 1995; Chandrasekaran & Narayanan, 1990; 
Barwise & Etchemendy, 1999; Barwise & Etchemendy, 1998; Barwise & 
Etchemendy, 1995; Swoboda & Allwein, 2002; D’Hanis, 2002; Shimojima, 
2002; Magnani, 1999]. 

Applications of diagrammatic reasoning range from teaching introduc- 
tory logic classes [Barwise & Etchemendy, 1994] to motions of vehicles and 
individuals engaged in a military exercise [Chandrasekaran, et ah, 2002]. 

There is no full consensus on the definition of a diagram. Shin and 
Lemon [Shin & Lemon, 2003] consider any picture to be a diagram and dis- 
tinguishing them as either an internal mental representation or an external 
representation. According to cognitive science, reasoning as performed by 
humans involves both types of diagrams. 

It has also been shown now that diagrammatic systems can have the same 
logical status as traditional linear proof calculi [Shin & Lemon, 2003; Bar- 
wise & Etchemendy, 1995]. This is an important result, because traditionally 
diagrams are considered a heuristic tool for discovering, explaining, and 
exploring a proof, but not as a part of a proof [Greaves, 2002; Shin & 
Lemon, 2003]. 

Diagrammatic reasoning is part of a wider field of study known as het- 
erogeneous reasoning [Barwise & Etchemendy, 1995] (discussed in section 
4 and Chapter 4) that itself is a part of even more general study known as 
visual explanatory (abductive) reasoning. 

The history of mathematics has seen several successful diagrammatic 
systems: Euler’s circles, Venn diagrams, Lewis Carroll's squares, and 
Charles Peirce’s existential graphs [Euler, 1768; Venn, 1881; Carroll, 1896; 
Peirce, 1933; Shin & Lemon, 2003]. Some of these systems have the consid- 
erable expressive power of a first order logic language that stimulates their 
use. In the next two sections, we discuss the Euler and Venn systems. 

Here it is appropriate to contrast the dimensionality of sentential and dia- 
grammatic examples. We note that sentential languages based on acoustic 
signals are sequential in nature, whereas diagrams, being inherently two- 
dimensional, are able to display some relationships without the intervention 
of a complex syntax [Shin & Lemon, 2003; Stenning & Lemon 2001]. This 
is in general true for symbolic logic languages, but for some natural lan- 
guages especially hieroglyphic languages, a sequential one-dimensional na- 
ture does not always hold. Individual hieroglyphs and their composition can 
be two-dimensional. Chapter 7 (Sections 2, 3) provides such examples. 

Table 2, based on [Shin, Lemon, 2003; Stenning, Lemon 2001; Barwise, 
Shimojiima, 1995; Thagard, Cameron, 1997], summarizes strengths and 
weaknesses of diagrams. 
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Table 2. Strength and weakness of diagrammatic reasoning 



Strength 


Weakness 


More flexibility in comparison with sequen- 
tial one-dimensional languages (natural lan- 
guages, symbolic logic) because of ability to 
use intuitive two-dimensional spatial rela- 
tions to represent relations between objects. 


Less flexibility in comparison with 3-D lan- 
guages. Exploits only 2-dimensional spatial 
relations. Cannot exploit non-planar 3-D and 
higher dimensional relations. 


Example: Euler Diagram 


Example: Non-planar graphs 


More compact representation in comparison 
with sequential languages because it can 
exploit a less formal grammar. Sequential 
languages need to encode two-dimensional 
spatial relations in a one-dimensional Ian- 


Often less compact representation in com- 
parison with sequential languages in repre- 
sentations of abstract objects and relation- 
ships 


guage that required a more complex and 
elaborated syntax. 


Example: Euler Circles (see section 2.2 in 
this chapter). 


Faster reasoning (“free ride” [Barwise, Shi- 
mojiima, 1995]) about two-dimensional spa- 
tial relations in comparison with sequential 
languages. Humans can read off 2-D spatial 
relations directly from 2-D sentences (dia- 
grams) without reasoning. 

Example: A map vs. a verbal description of a 
landscape. 


More effort is needed to extract relations 
between objects from 2-D sentence (diagram) 
than from an ordinary one-dimensional sen- 
tence. 

Many 2-D sentences (diagrams) are incom- 
plete and complexity of inference with the 
diagrams is NP-hard. 


Faster discovering causal relations in com- 
parison with sequential verbal rule-based 
representation. The later may require a 
longer search. 


Can discover irrelevant (non-causal) rela- 
tions if these relations are visualized and 
read. 


Often better understood than verbal, senten- 
tial explanations because icons better resem- 
ble what they represent 


Can be less understood if irrelevant relations 
are visualized and read. 



3. DIAGRAMMATIC REASONING 

3.1 Euler Diagrams 

One of the most productive mathematicians Leonhard Euler (1707 - 
1783) invented what we call now Euler diagrams [Euler, 1768]. Below we 
present Euler's own examples (see Figure 2) adopted from [Shin & Lemon, 
2003]. 

Example 1. All A are B. All C are Therefore, all C are B. 

This example can be rewritten as follows: 



IF All A are B and All C are A THEN all C are B. 
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Figure 2. Euler Diagram 

More exactly this is equivalent to two formal sentences. The first sentence is 
in the set-theoretical form and the second one is in the predicate form: 

Vx[(A3X^B9x) & (C 9X 9x) ] => (C 9X 9x) (1) 

l/x [(A (x) ^B (x)) & (C(x) ^A(x)] ^ (C(x) ^B(x) ) (2) 

Statements (1) and (2) can be formally proved in classical first order logic. 
The proof does not require a mathematician to be involved since it can be 
carried out by an automatic theorem prover, a computer program. 

The Euler diagram above does not provide a formal proof itself, but it 
makes it obvious for a human that the statement is true. Human visual in- 
spection of this diagram goes through the following steps: 

1) match the statement “All A are B" with two circles A and B and con- 
firm that A is nested in B (this can also be done by a computer pro- 
gram that has a diagram as input in vector file format), 

2) match the statement “All C are 5” with two circles C and B and 
confirm that C is nested in B, 

3) test visually that circle C is nested in circle A. 

These steps are not shown in the diagram and should be learned. Thus, dia- 
gram itself does not provide a proof, but make it easier to conceptualize. We 
“animate” the proof process to make the recording of these steps explicit and 
clear (see Figure 3.). 




IF (All A are in B) & (All C are in A) THEN (dll C are in B) 



Figure 3. Reasoning with Euler diagrams 
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The power of this representation lies in the following facts [Shin & 
Lemon, 2003]: 

• Object membership is easily conceptualized by the object lying in- 
side a set. 

• Set relationships are represented by the same relationships among 
the circles. 

• Every object x in the domain is assigned a unique location in the re- 
gion R. 

• Conventions above are sufficient to establish the meanings of these 
circle diagrams. 

Now let us consider another original Euler example that involves an exis- 
tential statement: “Some^ is B” presented in [Shin & Lemon, 2003]. 
Example 2. No A is B. Some C is A. Therefore, some C is not B. 

Euler’s solution is shown in Figure 4. 




Figure 4. Euler solution 



The idea of this solution comes from the fact that C may have several al- 
ternative relationships with^ and 5. Figure 4 shows three such cases: 

1 . C overlaps with A and without overlapping with B, 

2. C contains B and overlaps with A, 

3. C overlaps with both^ and B. 

There are many other possible relationships of C with A and .S. To identify 
them we can construct a 6-dimensional Boolean vector (x;, X 2 ,. . .pcs), where 
Xy=l if C overlaps with A, but neither contains A nor is contained in A 
X 2 =l if C overlaps with B, but neither contains B nor is contained in B 
X}=\ if C contains A, 

X 4 =\ if C contains B, 

Xj=l if C is contained in A, 
x«=l if C is contained in B. 

Each Xi is equal to 0 if the respective condition is not true. Potentially we 
have 2®=64 combinations, not just the three cases Euler listed. Some of them 
are not possible or do not satisfy premises of example 2, but obviously more 
than three cases are actually satisfy the premises. 

Before analyzing some of these other cases, it is instructive to consider 
Euler’s probable view of the “some” quantifier. It appears that Euler inter- 
preted “some” not as a modern existential quantifier 3 that assumes that if 
something is true for all it is also true for some: Vx P(x) => 3x P(x). It seems 
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that Euler’s some is “only for some”, that is if P(x) is true only for some x 
then another y exists that P(y) is false: SovnQxP(x) ^ 3y^P(x). 

Now let us turn back to those cases not drawn by Euler. For instance, the 
case <00001 1> where both A and B contain C is not possible, as that would 
contradict the premise, “No A is S”. Another case not drawn in Euler’s proof 
is obtained when C contains both A and B, <001100>. Under our usual un- 
derstanding of “some” this should be drawn since if A and B do not overlap 
this case satisfies both premises “No A is B” and “Some C is A” because all 
A is C. Yet under Euler’s probable interpretation of some, this case would 
not be drawn, as there is no x in ^ that is not also in C. Now we want to 
check if Euler’s solution is complete assuming the interpretation of “some” 
given above. Shin and Lemon [Shin & Lemon, 2003] do not accept Euler’s 
solution. 

Flowever it is far from being visually clear how the first two cases lead a 
user to reading off this proposition, since a user might read off “No C is 
B" from case 1 and “All B is C” from case 2. The third diagram could be 
read off as “Some B is A,” “Some A is not B,” and “Some B is not A” as 
well as “Some A is B. 

We disagree with this argument. Shin and Lemon accepted the first Euler 
example shown in Figure 2 as delivering clear statements about relations 
between sets. We can add more nested circles to Figure 2 and it will not be 
clear what nested relation the user may want to read off Recall, we added a 
guide to the user, a reasoning diagram displayed with Euler diagrams (Figure 
3). For example 2, perhaps the guide should at least be a verbal explanation 
that Euler wants to show only those of 64 potential situations (“possible 
worlds” in more modern terminology) where the statement is true. 

From our viewpoint, Euler provided unique and constructive clarity in 
example 2. Fie actually generated all possible cases given his use of the 
“some” quantifier. This constructive algorithmic approach is very sound 
from modern computer science viewpoint. 

The need to develop another representation is coming from the fact that 
when n, the number of predicates A, B, C grows the number of diagrams for 
a single statement may grow exponentially with n and the compactness of 
visual representation will be lost. This actually happened with Venn dia- 
grams. 



3.2 Venn Diagrams 

Venn diagrams were invented in 19* century [Venn, 1880, 1881] and are 
widely used. Below we discuss examples provided in [Shin & Lemon, 2003] 
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Example 3. Display “Nothing is A.” Figure 5 displays this statement as a 
Venn diagram. 



To represent “Nothing is A” the diagram conveys two statements, “AW A 
are B” and “No A is B”. These statements are not contradictory only if A is 
nothing, empty. The first statement is represented by the part of the diagram 
shown in Figure 6 on the left. The second statement is shown in Figure 6 on 
the middle and their combination is shown on right. 



At first glance, there is no intuitive meaning in representing “All A are B” 
and “No A is B” with diagrams shown in figure 6. At least these diagrams do 
not match to meaning of Euler diagram. Venn added shading to the legend to 
represent the empty part of the diagram. In the left diagram, the shaded part 
of A indicates that this part is empty. Using Euler approach we can read that 
this part of A does not belong to B. Combine this with the fact that it is 
empty and we conclude that “All A are B”. Similarly, in the middle diagram, 
we note that the shaded area consists of the common elements of A and B. 
Because the area is shaded, it is empty, that is, there is no A that is also B, 
‘TSfo ^ is B”. Overlaying one diagram over another diagram, we obtain the 
resulting diagram shown in Figure 6. It is important to notice that knowing 
the Venn legend, we can read the right-hand diagram directly, - every part of 
A is empty, noticing that all parts of A are shaded. 

Figure 6 has two important properties it shows the inference of the result 
and an intuitive graphical way to prove the statement: 




Figure 5. Venn diagram for empty set A. 




All A are B & No A i?,B ^ Nothing is A 



Figure 6. Reasoning with Venn diagrams 



All ^ are S & No A isB => Nothing is A 
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Venn diagrams use a “primary diagram” legend that shows a general possi- 
ble disposition of two sets overlapping. This is a departure from the original 
Euler idea that permitted combining several Euler diagrams into one. In 
Euler form, this example is presented in Figure 7. 





All A are B & No A isB 



No single 

Euler 

diagram 



=> Nothing is A 



Figure 7. Euler diagram for “Nothing is T” 



3.3 Peirce and Shin diagrams 

Example 4. Figure 8 represents the statement, “All A are B or some A is B", 
which neither Euler's nor Venn's system can represent in a single diagram. 




Figure 8. A Peirce diagram “All A are B or some A is B” 

Figure 8 uses Peirce’s legend [Peirce, 1933], where 

• ‘o’ represents emptiness, 

• ‘x’ represents “some” (existential quantifier 3), and 

• a linear symbol connecting the ‘o’ and ‘x’ symbols represents 
disjunctive information. 

In this visual language. Figure 8 indicates that part of A that does not belong 
to B is empty as it uses the empty symbol ‘o’, i.e., “All^ are B” 

Example 5. Figure 9 represents the proposition “Either all A are B and some 
A is B, or no ^ is S and some B is not A.” 

Most of the people including Peirce himself agree that this diagram is too 
complex in comparison with very intuitive Euler diagram. An alternative 
visualization of example 5 is shown in Figure 10 
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[Shin, 2003; Shin & Lemon, 2003]. This visualization uses the following 
legend: 

• Venn's shadings are used to designate emptiness, 

• Peirce's ‘x’ is used for existential properties, and 

• Peirce's connecting line between x's is used for disjunctive infor- 
mation. 





Shin and Lemon [Shin & Lemon, 2003] state that this Shin diagram 
demonstrates increased expressive power without suffering the loss of visual 
clarity that happened in Peirce’s diagram. While this is true, it seems that the 
Shin diagram is also limited in scalability. Let us assume that we have more 
that two predicates A and B, say, four or five predicates or sets with similar 
relations between them. 

The number of Shin diagrams can grow exponentially. Consider will 
happen with for five sets A, B, C, D, and E. We may need ten pairs of dia- 
grams of the type shown in Figure 10. Use of additional graphical elements 
such as color and texture can make scalability of the problem less severe. 

Regardless of this limitation. Shin diagrams have several important prop- 
erties that make them equivalent to rigorous systems expressed in formal 
logic. This formal system is sound and complete in the same sense that some 
symbolic logic is complete [Shin & Lemon, 2003]. 

Table 3 provides a summary of diagrammatic systems for representing 
relations between sets. 
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Table 3. Characteristics of diagrammatic systems 



System 


Characteristics 


Euler diagrams [1768] 


Clear intuitive homomorphic relation between circles and sets. 
Forces the display of more relations than needed or known 


Venn diagrams [1880] 


“Primary diagram” concept and shading for encoding emptiness 
to represent partial knowledge about relations between sets 


Peirce diagrams [1933] 


Symbols for existential and disjunctive infomiation with loss of 
visual clarity because of the introduction of more arbitrary con- 
ventions 


Shin diagrams [2003] 


Two or more Peirce diagrams for restoring Euler and Venn visual 
clarity 


Hammer and Shin [1998] 


Restored Euler's homomorphic relation between circles and sets, 
by adopting Venn's primary diagrams, but without existential 
statements 



3.4 Graph-based diagrammatic reasoning 

Explanatory reasoning can be naturally described in terms of two graphs 
Gi and G2, and a rule R: Gi=>G2 that represents an explanation. Here graph 
G2 visually conveys something that should be explained and graph Gi con- 
veys something that explains G2, for short Gi explains G2. Graph Gi can be 
viewed as a hypothesis if it is not known that Gi is actually true [Thagard & 
Cameron, 1997 ]. 

The explanation rule R can be accompanied with a transformation proce- 
dure that shows how to get G2 from Gi. The visual proof of Pythagorean 
Theorem is an obvious example of such explanatory visual reasoning where 
G2 is a diagram of the Theorem statement (a^+b^=c^) and Gi is another dia- 
gram with a known relations between its components. The proof of the Py- 
thagorean Theorem is a visual transformation of Gi to G2. 

Figure 4 in Chapter 1 shows this transformation in six steps, where there 
are four graphs Gn, G12, G13, and G14 between Gi and G2. In more formal 
terms, the rules of transformation RipGi^Gn, Ri2:Gn^Gi2 Ri3:Gi2=^Gi3, 
and Ri4:Gi 3=>G2 form the graph reasoning grammar [Thagard & Cameron, 
1997 ]. The major difference between graph transformations and standard 
verbal transformation rules is that graph transformations have a natural vis- 
ual representation. 

Full, general explanatory reasoning involves visual transformations along 
with verbal reasoning as proof the Pythagorean Theorem shows. A future 
multimodal theory of explanatory reasoning may include smell, touch, and 
emotion as [Thagard & Cameron, 1997 ] suggest in noting that physicians 
sometimes diagnose patients using odor. A multimodal reasoning theory is 
also called heterogeneous reasoning and is discussed in the next section. 
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4. HETEROGENEOUS REASONING 



Jon Barwise late director of the Visual Inference Laboratory at Indiana 
University continued visual approach taken by Charles Peirce for inference 
structures and processes. Barwise with his colleagues developed a visual 
notation and the Hyperproof reasoning system that uses diagrams to make 
collections of logical expressions more intuitive [Barwise & Etchemendy, 
1994; Allwein & Barwise, 1996]. A further development of this heterogene- 
ous reasoning approach is presented in Chapter 4. The description of the Hy- 
perproof system below is adapted from [Barwise & Etchemendy, 1994]. It is 
a heterogeneous reasoning system, that uses two types of an initial (given) 
information: (i) a diagram depicting a block world (called the situation), and 
(ii) sentences in first-order logic. Typically, sentences describe the goal and 
a visual mean (diagram) provides a situation description. The system sup- 
ports 27 different types of goals including various “diagrammatic” goals and 
sentence proving goals. Some of the goals are: 

• to proof a particular sentence using the given information, 

• to show that the sentence can not be proven from the given informa- 
tion, and 

• to determine characteristics of the blocks in the diagram such as the 
size or shape of a particular block using the diagram and given sen- 
tences. 

For instance, the goal sentence can be to prove that block c is the same 
shape as d, SameShape(c, d), or to determine the size, shape or location of 
the highlighted block. The Hyperproof system uses a diagram legend de- 
scribed in Table 4. See also Figure 11. 



Table 4. Diagram legend 



Icon (syntactic 
element) 


Meaning (semantics) 


Kripke 3-valued logic (true, false, un- 
known) predicate sentence or term 


Barrel 


A block of unknown size 


Size(block)=U, 

where U is a term “unknown” 


Orange triangle 


Indicator of tetrahedron 


Tetrahedron(block)=T (true) 


Question mark 


Indicator of a block of un- 
known size and shape 


Size(block)=U & Shape(block)=U 


Block on checkers 
board cell (i,j) 


Indicator of location of the 
block on the checkers board 


Board(block,i,j)=T 


Block outside 
checkers board 


Block with unknown location 
on the checkers board 


V i,j Board(block,i,j)=F 


Roman letters 


Names of blocks 


a,b,c,d,...,z 



During the reasoning process, the system may ask the user to determine 
the size of some block, say block e (see Figure 11). Barwise and Etche- 
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mendy noticed that constructing proofs of antecedently specified sentences 
such as A=>B is rarely performed in everyday life. More commonly, it must 
determine whether the goal can be satisfied with the information at hand. 
They point out that (1) Sherlock Holmes is not told that the butler is the 
murder, and asked to prove this assertion, and (2) Holmes’ goal is to find out 
who did it and possibly recognize that more evidence must be gathered. 




Figure 11. Hyperproof (with permission from Barwise & Etchemendy, 
http://www-csli.stanford.edu/hp/Hproof2.html). See also color plates. 

Table 5 and Figure 12 provide an example of a problem on which Hyper- 
proof work is based [http://www-csli.stanford.edu/hp/Hproof2.html]. 

The visual solution provided by Hyperproof follows this intuitive logic, 
where humans seamlessly combine visual information from the diagram with 
sentences. Barwise and Etchemendy have shown that the solution of this 
problem completely converts to non-visual, first-order logic sentences re- 
quiring several hundred steps. Here, based on sentences and the diagram, a 
user immediately recognizes that all blocks on the left of the dodecahedra in 
column 6 are irrelevant to the task, which cuts the search significantly. 



Table 5. Hyperproof example 


Goal 

Questions 


Find out whether the block named “a” can be identified from the given visual 
and textual information and find this block on the diagram if it can be identi- 
fied. 


Given 

informa- 

tion 


(1) Block h is a dodecahedron, Dodec(h) 

(2) Block b is to the left of block a (from our perspective), LeftOf(h,a) 

(3) Block a is large. Large (a) 

(4) The tetrahedron that is farther right from b is large, Tetra(x) & Righ- 
tOf(b,x) & FartherOf(b,x) 

(5) A diagram is shown in Figure 12 (only one block on the right from do- 
decahedra and this block is a tetrahedron) 


Intuitive 

reasoning 


According to (1) block h is a dodecahedron. According to the diagram, block b 
must be one of the two dodecahedra in the sixth column. According to (2), 
block a is on the right from block b. According to the diagram, there is only 
one block right to block b and this block is father right than one of the dodeca- 
hedra. According to (4) this block is large. Thus, block a should be a large tet- 
rahedron. 
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Figure 12. Task diagram (with permission from Barwise & Etchemendy, 
http://www-csli.stanford.edu/hp/Hproof3a.html). See also color plates. 



If a diagram is not given, then several diagrams are generated and the 
same task is solved with each of them. A diagram is tested if it is compatible 
with sentences and a block a can be identified using given sentences and the 
assumed diagram. 

Barwise and Etchemendy [Barwise & Etchemendy, 1995] claim that it is 
not necessary to create an Interlingua to be able to reason with heterogene- 
ous information. We feel that some clarification is needed here. The diagram 
in the Hyperpoof is defined formally using the same predicates that used in 
sentences. There is a one-to-one mapping of the visual legend of the diagram 
with predicates to terms in sentences and a formal description of the dia- 
gram. 

Thus, in some sense, the Interlingua is not needed here because of the 
way both representations were designed. If the diagram were described in, 
say, the traditional terms of computer graphics, e.g., as OpenGL code with 
concepts such as lines, rectangles, and textures or as a single raster bitmap 
image, an Interlingua would surely be needed. This problem is well known 
in scene analysis, robotics and geospatial imagery analysis where informa- 
tion is not only heterogeneous but also obtained from disparate sourees 
that have not been coordinated in advance. 



5. GEOMETRIC REASONING 

Geometric reasoning has a long and inspiring history traced back to the 
Greeks. One might hope from the term that modern geometric reasoning 
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may continue the intuitively clear visual geometric line of reasoning inher- 
ited from Greeks. In fact, as our short review will show, it is not yet the case. 

In 1950s, one of the first and seminal efforts of Artificial Intelligence 
(AI) research was to simulate human geometric reasoning in a computer 
program [Gelenter, 1959]. “This research activity soon stagnated because the 
classical AI approaches of rule based inferenee and heuristic search failed 
to produce impressive geometric, reasoning ability” [Kapur & Mundy, 
1989]. 

The next attempts of computer simulation for geometric reasoning were 
algebraic approaches developed in 1980s and 1990s [Kapur & Mundy, 
1989; Chou & Gao, 2001]. In fact, both approaches were a modern return to 
the Cartesian tradition in mathematics, which was very successful for centu- 
ries by transforming visual geometric tasks into sets of algebraic, vector and 
matrix equations or non-visual If-Then rules. As we shall see below, both 
rule-based and algebraic approaches departed from intuitively clear visual 
geometric proofs. 

Below we describe the frameworks of both approaches using work 
adapted from [Chou & Gao, 2001]. In algebraic approach geometric state- 
ments are converted to a set (conjunction) of equations 

h!(yi,y2,...,ym)=0 

h2(yi,y2,...,ym)=0 

hfyi, y 2 , ■■.,ym)=0 

c(y!, y2,...,ym)=0- 

It is usually assumed that coefficients in equations are rational numbers. 
Thus, the algebraic form of the geometry equation would be 

Vy [hi(y)=0 hXy)=0) ^ c(y)=0]. 

Chou, Gao, and Zhang [Chou, Gao & Zhang, 1996] also demonstrated 
that a revived AI rule-based approach was able to provide valuable results — 
short proofs of tasks where an algebraic solution of polynomial equations 
was long. A geometric rule or axiom used in their Geometry Expert (GEX) 
system has the following form: 

Vx[P;(x)cfe...c6P,(x))^2(y)], 

where x is a point occurring in the geometry predicates Q. The following 
is one of the mles used in GEX (a diagram is presented in Figure 13): 

AB\\CD if and only if Angle(^S, PQ) = Ang\e{CD,PQ) 
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Figure 13. Diagram for non-visual rule R1 

The GEX system implements several methods. According to [Chou, Gao 
& Zhang, 1996] one of the methods (Wu’s method, based on polynomial 
equations with the characteristic set) has been used to prove more than 600 
geometric theorems. Another method (based on high-level geometric lem- 
mas about geometric invariants) produced short, elegant, and human- 
readable proofs for more than 500 geometry theorems. Other methods use 
the Groebner basis method for polynomial equations, the calculation of vec- 
tors, complex numbers, and full-angles. The full-angle method is a mle- 
based method that produced very short proofs in cases where all other meth- 
ods fail because very large polynomials arise during in the proving process. 



6. EXPLANATORY VS. DEDUCTIVE REASONING 

The visual reasoning models we considered above focused on deductive 
reasoning. A typical deductive reasoning model distinguishes data from 
hypotheses, explains data by mapping hypotheses to data, and does not ex- 
plain hypotheses. This is a common situation in many machine learning and 
data mining situations. Thagard and Cameron [Thagard & Cameron, 1997] 
claim that such models are not adequate for pictorial discovery. They ar- 
gue that 

1. explanatory (abductive) reasoning models should be used instead and 
models should include explanatory hypotheses that are explicitly 
formed and evaluated, and 

2. Explanatory reasoning should not be equated with a relatively simple 
formal logical deductive reasoning models, such as models presented in 
[By lander, Allemang, Tanner & Josephson, 1991; Konolige, 1991]. 

These arguments are supported by the example of visual explanatory reason- 
ing from archeology that we describe in section 7 below. 
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The term “abductive” for explanatory reasoning was coined by Charles 
Peirce a hundred years ago along with the term “iconic” reasoning [Harts- 
horne & Weiss, 1958; Thagard, 1988]. 

Explanatory reasoning is informally described as defining a set of charac- 
teristics concerning the explanatory hypothesis. Indeed, this serves as the 
major characteristic of explanatory reasoning. More comprehensively, ex- 
planatory reasoning should deal with: 

• a hierarchy of the hypotheses where some hypotheses explain others 
(layered hypotheses); 

• a set of hypotheses is not given in advance and need to be discovered 
or constructed (creative hypotheses); 

• hypotheses may contradict to each other and it is necessary to estab- 
lish a domain theory (revolutionary hypotheses); 

• Hypotheses may not explain the all data (incomplete hypotheses); 

• Some hypotheses are not verbal, but visual iconic hypotheses; 

• Constraints for building hypotheses are known only partially and 
methods for testing constraints satisfaction may not be obvious. 

For more detail, we refer to [Thagard & Cameron, 1997]. The motivation 
for introducing many of characteristics listed above is obvious. There is also 
an argument that deductive reasoning is not necessarily explanatory. 

For example, we can deduce the height of a flagpole from information 
about its shadow along with trigonometry and laws of optics. However, it 
seems odd to say that the length of a flagpole's shadow explains the flag- 
pole's height. . . Some additional notion of causal relevance is crucial to 
many kinds of explanation, and there is little hope of capturing this no- 
tion using logic alone [Thagard & Cameron, 1997]. 

The hope is that visually layered reasoning will be especially useful in 
capturing, justifying (and possibly refuting) causal relations. 



7. APPLICATION DOMAINS 

Applications of diagrammatic, explanatory reasoning to architectural de- 
sign are explored in [Barwise & Etchemendy, 1998; Barker-Plummer & 
Etchemendy, 2003] and in Chapter 4. The role of visual problem solving in 
graphic design in discussed in [Lieberman, 1995]. Design reasoning in- 
volves multiple representations of information, complex rationale, and goal 
structures. Design reasoning naturally fits the diagrammatic approach. A 
new application domain for diagrammatic reasoning has emerged recently in 
the area of automated theorem proving [Jamnik, 2001; Shin & Lemon, 
2003]. 
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Another application of the diagrammatic approach is the analysis of loca- 
tions and motions of vehicles and individuals engaged in a military exercise 
[Chandrasekaran et ah, 2002]. The goal of this work is use the diagrammatic 
reasoning to infer maneuvers, plans and intentions of the two sides with in- 
tention of improving the efficiency of decision making. Use of diagrammatic 
reasoning for this task intends to 

• decrease overload for a decision maker, 

• automatically generate hypotheses about emerging threats and devia- 
tions of a side’s behaviors from the expected behaviors, 

• summarize a vast amount of detail in diagrams, 

• provide insights about diagrammatic constmcts that are the most im- 
portant, and 

• provide insights about diagrammatic constructs that are supportive of 
specific types of inferences. 

A variety of questions is of interest is this application, such as: 

Is there a movement of an opposing force toward the left flank of the 
current force FI? 

The answer should be inferred from the map with iconic representation of 
the sides, left and right flanks of both friends and enemies. 

Our next example in the use of diagrammatic reasoning is presented in 
[Pisan, 2003] for reasoning about supply and demand of cassette tapes The 
input information is presented as a plot of two linear functions Sup- 
ply(quantity) = price and Demand(quantity) = price, which is sketched in 
Figure 14. The visual reasoning system called SKETCHY is able to inter- 
pret the plot and answer for the questions such as: 

At what point is Supply equal to Demand? 

What is the price for the Supply line when the quantity is 350? 

If a user changes the plot, the system adjusts its answers. 



Price 




Figure 14. Plot used for automatic visual reasoning 

Another example of visual explanatory reasoning has been provided in 
[Thagard & Cameron, 1997] from the area of archeology. The visual fact to 
be explained consists of two unusual notches in the skullcap of an australo- 
pithecine. The placement, depth, and direction of notches can be expressed 
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as formal sentences with spatial predicates, but they are more naturally given 
visually. 

Two hypotheses have been generated in this example: 

Hypothesis 1 : The notches had been inflicted by two separate blows 
from a weapon wielded by another hominid. 

Hypothesis 2: The notches had been created by a leopard, which 
had taken the australopithecine's head in its mouth. 

These hypotheses were accompanied with several explanations, another 
hypotheses, scenarios, and facts. These are presented in Table 6. 



Table 6. Hypotheses, explanations, facts, and scenario 



Hypothesis 1 


Explanation 1 


Notches had been made at divergent angles from the centerline (that is 
clearly visible on the picture without any formalization). 


Domain theory, 
scenario 1 


Human evolution had been driven by murder and cannibalism (“killer 
ape” hypothesis). 


Hypothesis 2 


Explanation 2. 1 


The lower canine teeth of leopards diverge and are about the right dis- 
tance apart. 


Explanation 2.2 


A fossil leopard jaw from the same site fits the notches fairly well. 


Supporting 
fact 2.1: 


The entrance to the cave was a vertical shaft when the australopithecine 
remains were deposited and those remains are mostly comprised of skull 
fragments. 


Supporting 
fact 2.2 


Leopards visit similar shaft caves in the area today and use the trees, 
which grow around the entrances as places to consume prey out of the 
reach of hyenas. 


Supporting 
fact 2.3 


Leopards tend to destroy the skeletal material of their primate prey with 
the exception of the skulls. 


Explanation 

scenario 


The skulls have fallen from the trees and into the cave shafts 


Formal, non-visual reasoning would require conversion of visual input 
such as human visual memory of site observation, picture, and video into 
non-visual sentences. This is complex work and one of the major drawbacks 
and reasons for failure in development of knowledge bases in many do- 



mains. In addition, this approach may require conversion back from non- 
visual sentences to task’s original visual form since that is natural for do- 
main experts. Finally, we may have a complex and unnecessary double con- 
version: visual — > non-visual — > visual, trying to follow a non-visual ap- 
proach (see Chapter 1 (Section 1.5) for more detail on double conversations). 
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8. HUMAN AND MODEL-BASED VISUAL 
REASONING AND REPRESENTATIONS 



8.1 Spatial reasoning vs. visual reasoning 

There is a widespread belief that human visual reasoning with material 
that is easy to visualize speeds up problem solving more than with material 
that is hard to visualize. Knauff and Johnson-Laird [Knauff & Johnson- 
Laird, 2000] stated that the literature does not provide consistent evidence 
for such claim. To explore this issue four types of verbal relations were iden- 
tified in [Knauff & Johnson-Laird, 2000; Knauff, Fangmeier, Ruff & John- 
son-Laird, 2003]: 

(1) spatio- visual relations that are easy to envisage both spatially and 
visually (e.g., above-below) 

(2) “control” relations that are hard to envisage either spatially or 
visually (e.g., better-worse) 

(3) spatial relations that are hard to envisage visually but easy to envis- 
age spatially, and 

(4) visual relations that are hard to envisage spatially but easy to envis- 
age visually (e.g., cleaner-dirtier). 

The following deductive reasoning task from [Knauff & Johnson-Laird, 

2000] illustrates reasoning with visual but not spatial relations cleaner- 
dirtier. 

The dog is cleaner than the cat. 

The ape is dirtier than the cat. 

Does it follow: The dog is cleaner than the ape? 

In experiments, these authors measured time for solving similar tasks 
with different types of relations. The speed of reasoning (solving these prob- 
lems) was in accordance with the order of the relations listed above, where 
(1) was the fastest. Reasoning with relations of type (1) took 2200 ms, with 
relations of type (2) took 2384 ms and with relations of type (4) took 2654 
m s . Note, the speed difference between (1) and (2) was not statistically sig- 
nificant. Also the fact that reasoning with (4) was slower than with (2) might 
not be expected in advance. Knauff and Johnson-Laird [Knauff & Johnson- 
Laird, 2000] provided the following explanation of this result. 

A relation that has a natural spatial model should speed up the process of 
reasoning. In contrast, a visual relation, such as dirtier, may elicit irrele- 
vant visual detail. One imagines, say, a cat caked with mud, but such a 
representation is irrelevant to the transitive inference. It takes additional 
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time to replace this vivid image with one in which dirtiness is represented 
in degrees. In other words, the visual relations, which are hard to envis- 
age spatially, lead to a mental picture. 

Another experiment conducted in [Knauff, Fangmeier, Ruff & Johnson- 
Laird, 2003] supported the explanation about the addition time: 

All relations elicit mental models that underlie reasoning, but visual rela- 
tions in addition elicit visual images. 

This experiment used functional magnetic resonance imaging to identify 
types of brain activities during work with relations (l)-(4), that were pre- 
sented acoustically via headphones (without any visual input). 

These provided experiments are important for understanding limits of 
efficient visual reasoning and problem solving. 



8.2 Cognitive operations 

Human image-processing abilities are critical in everyday problem solv- 
ing and have been very efficient in important scientific discoveries [Shepard 
& Cooper, 1982]. Biologically inspired models for reasoning with images 
can be potentially very efficient. In this section, we review the current cogni- 
tive theory on this subject as a potential base for models of visual and spatial 
reasoning and decision making. 

Table 7 below outlines a cognitive theory of human image processing 
based on [Kosslyn, 1999], it includes four operations: image generation, in- 
spection, maintenance, and transformation. 



Table 7. Human image processing [Kosslyn, 1999] 



Image 

generation 


One generates an image of an object by looking up a "visual code" in as- 
sociative memory, which in turn activates a visual memory in the ventral 
sys-tem. A spatial pattern of activation is imposed on topographically or- 
ganized regions in the occipital lobe. If a detailed image is required, one 
now accesses associative memory and looks up the most distinctive part or 
property as well as its location. One then shifts attention to the appropriate 
location. 


Inspection 
of imaged 
patterns 


One inspects the imaged pattern by shifting the attention window over it 
to encode previously unconsidered properties (e.g., the shape of an animal's 
ears). Imaged patterns are recognized by matching the input to stored visual 
memories. Spatial relations are encoded using the dorsal system. Once one 
has formed an image, one can "see" parts that are embedded in the shape, 
and were not noticed explicitly when the object was first encoded. 


Image 

maintenance 


One can maintain it <image> by re-activating the visual memory represen- 
tations in the ventral system; these representations eventually degrade due 
to adaptation, hence an image cannot be retained indefinitely. 


Image trans- 
formation 


One can transform an imaged pattern by shifting the pattern in spatially 
organized regions of the occipital lobe. 
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Thus, according to this theory human image processing starts from input 
visual data “as is” without identifying specific properties and parts. Then the 
process of detailed image generation starts, where first most distinctive parts 
and properties are identified before identifying other properties. This identi- 
fication is done by matching the input to stored visual memories. 

8.3 Visual representation and reasoning models 

Visual reasoning and problem solving depend heavily on the visual rep- 
resentation used. Below we describe some visual representation models 
found in computer science and cognitive science summarized in Table 8. 

Table 8. Visual representation models 



Array-based 
visual semantic 
representation 


Location of the item in the array is matched approximately to its location 
in 2-D image or 3-D scene. An item can be represented as a subarray if it 
has its own internal subitems. Location and sizes of the items in the array 
pennit to represent relations such as left-of and north-of. Limitation of the 
model: arrays do not represent directly a relation when a single top trian- 
gle is over both of the lower triangles. 


CaMeRa model 


Pictorial information consists of a bitmap and associated node-link struc- 
tures that provides semantic metadata. 


Semantic 
network with 
scene graphs 


A node within semantic network can represent a complete concept. A 
node in a semantic network can be linked to a scene graph. A node within 
a scene graph could be a part of a hierarchy representing a single visual 
object. 


Knowledge- 
based model 


The stored scene knowledge includes: 

• the observed features and relationships such as part-whole rela- 
tionships, constraints among the subparts (algebraic constrains, “re- 
striction graphs”), and relationships over time; 

• expected objects (presented as models, parameterized object models, 
“object graphs”, “observation graphs”, slots, filler frames); 

• prediction tools (“prediction graphs”) to predict expected objects 
using observed features and relationships; 

• “interpretation graphs” to eliminate inconsistencies in match of 
observed features and relationships to the models using more reason- 
ing; 

• hypotheses (primarily top-down) to drive prediction and interpreta- 
tion. 


Probabilistic 

model 


Knowledge-based model refined with probabilistic information 


2D iconic model 


2D iconic representations built from different views of the 3D model to 
simplify matching. 

2D strings for iconic indexing built as pairs of one-dimensional strings 
that represent the symbolic projections of the objects on the x and y axis. 


Deformable 
object model 


Deformable objects built using statistical rather than geometric relation- 
ships. Such representations can be learnt from examples. 
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This table is based on [Croft & Thagard, 2002; Baxton & Neumann, 
1996; Glasgow & Papadias, 1992; Tabachnik-Schijf, Leonardo & Simon, 
1997]. These models include array-based models, node-link based models, 
semantic networks with scene graphs, knowledge-based models, probabilis- 
tic models, 2D iconic models and deformable object model. 

Iconic image representations are considered biologically plansible 
[Buxton, Neumann, 1996]. The term icon is used in a variety of meanings. 
One of them was discussed in section 2 above that followed Peirce’s ap- 
proach. Nakayama [1990] and Rao and Ballard [1995] use the term iconic to 
describe small visual templates, which constitute visual memory. 

Specifically in [Rao, Ballard, 1995] a set of icons is just a vector of nu- 
meric parameters associated with a pixel or patch, extracted from the image 
using some local filters of different size of localities, and used to identify 
rotation. Parameters are called icons because they have visual equivalent and 
they are small like icons because they cover small patches. 

These icons can be called low-level icons. They show just line direction 
or pixel distribution. High-level icons represent real world concepts such as 
a house or bridge. Both icon types can make image representation shorter if 
the image can be described only by patches with complex patterns. In addi- 
tion, iconic representation is biologically plausible, that is mental images 
possibly are generalized real images in the form that resembles icons [Bux- 
ton, Neumann, 1995; Rao, Ballard, 1995]. 

Rao and Ballard based their iconic model on biological evidence [Field, 
1994; Kanerva, 1988] about the primate visual system. Specifically, this sys- 
tem takes advantage of the redundancy in the visual environment by produc- 
ing a sparsely distributed coding that aims to minimize the number of simul- 
taneously active cells. 

According to [Kanerva, 1988], the memory operates on features and cre- 
ates internal objects by chunking together things that are similar in terms of 
those features and relatively invariant to the changes in the environment. 

Rao and Ballard [Rao & Ballard, 1995] hypothesize that visual memories 
could consist of iconic representations stored in a distributed manner which 
can be activated by an incoming visual signal or other iconic representation. 
In the context of this hypothesis, visual perception is an activation of mem- 
ory. 

Thus, relatively invariant iconic feature vectors may be viewed as an ef- 
fective medium for vision-related memory storage. The Bruegel visual corre- 
lation system presented in Chapter 10 is icon based and derives benefits 
from such properties of human perception. 
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9. CONCLUSION 

Reasoning plays a critical role in decision making and problem solving. 
This chapter provided a comparative analysis of visual and verbal (senten- 
tial) reasoning approaches and their combination called heterogeneous rea- 
soning. It is augmented with the description of application domains in visual 
reasoning. Specifics of iconic, diagrammatic, heterogeneous, graph-based, 
and geometric reasoning approached have been described. Next, explanatory 
(abductive) and deductive reasoning are identified and their relationships to 
visual reasoning are explored. The chapter also presents a summary of hu- 
man and model-based reasoning with images and text. Issues considered in- 
clude cognitive operations, differences between human visual and spatial 
reasoning, and image representation. 

The experience of generations of scientists such as Bohr, Einstein, Fara- 
day, and Watt have shown that visual representations and reasoning can 
greatly improve the ability for finding and testing explanatory hypotheses. 
The hope is that systematic visual and heterogeneous reasoning will serve 
the same role although this remains to be seen. We believe that the funda- 
mental iconic reasoning approach proclaimed by Charles Peirce is the most 
comprehensive heterogeneous reasoning approach and as such it need to be 
further developed. 

We share the vision expressed by M. Greaves [2002] as to why structured 
graphics as part of heterogeneous reasoning has been largely excluded from 
contemporary formal theories of axiomatic systems. He concluded that there 
are no other reasons than historical and philosophical heritage stretching 
from the Greeks to the early twentieth-century work of David Hilbert. 



10. EXERCISES AND PROBLEMS 

Simple 

1. Construct an Euler diagram and a reasoning diagram similar to that 
shown in Figures 2 and 3 for the statement “No A is B. All C are B. 
Therefore, no C is A.” Comment on the visual clarity of your result. 

2. Adapted from [Lemon & Pratt, 1997]. Using Euler Circles, try to repre- 
sent the following premises: 



A DB n C ^ 0 
B n C DD ^ 0 

c nn HA ^ 0 
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Show that this is impossible using diagram shown in Figure 15, where 

yinsncnz)^0. 

Try to modify diagram in Figure 15 to satisfy AVBVCVD = 0. Does 
it continue to be a set of Euler Circles? 




Figure 15: An Euler's Circles representation exhibiting Helly's Theorem 



Advanced 

3. Adapted from [FI. Simon, 1995]. Assume that somebody wrote: “I notice 
a balance beam, with a weight hanging from a two-foot arm. The other 
arm is one foot long.” Then somebody asked the question: “Flow much 
force must I apply to the short to balance the weight?” 

What kind of reasoning would be appropriate for this situation? Is it ver- 
bal reasoning? If so, what are its axioms and rules of inference, and where 
do they come from? Are the axioms logical, or do they embody laws of 
physics? What kind of heterogeneous reasoning might you suggest? 
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Abstract: In this chapter, we describe a computational architecture for applications that 

support heterogeneous reasoning. Heterogeneous reasoning is, in its most 
general fonn, reasoning that employs representations drawn from multiple rep- 
resentational forms. Of particular importance, and the principal focus of this 
architecture, is heterogeneous reasoning that employs one or more forms of 
graphical representation, perhaps in combination with sentences (of English or 
another language, whether natural or scientific). Graphical representations in- 
clude diagrams, pictures, layouts, blueprints, flowcharts, graphs, maps, tables, 
spreadsheets, animations, video, and 3D models. By “an application that sup- 
ports heterogeneous reasoning” we mean an application that allows users to 
construct, record, edit, and replay a process of reasoning using multiple repre- 
sentations so that the structure of the reasoning is maintained and the informa- 
tional dependencies and justifications of the individual steps of the reasoning 
can be recorded. Our architecture is based on the model of natural deduction in 
formal logic. In this chapter we describe and motivate the modifications to the 
standard logical model necessary to capture a wide range of heterogeneous 
reasoning tasks. The resulting generalization fonns our computational archi- 
tecture for heterogeneous reasoning (CAHR). 

Key words: heterogeneous reasoning, diagrammatic representation, sentential representa- 

tion, decision support, rationale capture, constrained graphical editing. 



1. INTRODUCTION 

Until recently, theoretical accounts of reasoning have been limited to 
homogeneous linguistic reasoning; that is, reasoning in which all informa- 
tion is represented in the form of sentences of some language, either natural 
or formal. This presents a major obstacle to the application of insights about 
the structure of reasoning to real-world problem solving, for example the 
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reasoning involved in the construction of designs, which typically involves 
graphical representations. 

This chapter presents a theoretically informed account of real-world 
problem solving based on an understanding of the nature and structure of 
reasoning, or more precisely, rational justification; historically the province 
of logic. We believe that the overall structure of rational justification de- 
scribed by the theory of natural deduction [Fitch, 1952; Gentzen, 1935; 
Prawitz, 1965; Prawitz, 1971] is a reasonable model of the gross structure of 
everyday reasoning. We understand natural deduction in its broadest sense, 
in which proofs are seen as recursively structured rationales that display the 
overall structure of reasoning and permit the nesting of proofs within proofs. 

We will begin by describing the traditional logical model of deduction 
and its representation by formal proofs. This model assumes that informa- 
tion is presented as formal sentences in a single language, and that inference 
proceeds by applying rules based on the syntax of these sentences. 

We then illustrate the generalizations necessary to handle heterogeneous 
deduction. The introduction of diagrammatic representations requires exten- 
sions to the traditional model. Most obvious among these are the fact that 
diagrammatic inference typically proceeds by making incremental modifica- 
tions to a single representation, while sentential inference proceeds by the 
introduction of new sentences into a proof In addition, the reliance on for- 
mal syntax to drive inference must be generalized. 

Finally, we describe further generalizations to this model, which permit 
the representation of reasoning not subject to the constraints of formal de- 
duction. This results in an architecture which can in principle be applied to 
numerous domains in which heterogeneous reasoning is carried out, and in 
which formal criteria of validity are not available. Our account allows us to 
describe a large class of real-life reasoning, such as that involved in the de- 
sign of complex artifacts. In particular, the generalization encompasses 
three important features that are not considered by traditional models of rea- 
soning: heterogeneity of representation, of rationale and of goals. 

• Heterogeneity of representation. Most reasoning and design prob- 
lems require the marshaling, manipulation and communication of in- 
formation represented in a wide variety of formats ([Glasgow, Na- 
rayanan & Chandrasekaran, 1995] presents a collection of influential 
articles in this area.) For example in addition to circuit diagrams, an 
electrical engineering team may use state machine diagrams, timing 
diagrams and logic ladder diagrams, together with algebraic or natu- 
ral language specifications of the desired input/output behavior, in 
order to produce a design that meets the client's needs [Harel, 1988; 
Johnson, Barwise & Allwein, 1993]. 
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• Heterogeneity of rationale. The formal model of proof is far too 
constrained to countenance the kinds of reasoning involved in design 
and practical problem solving. In particular, the model is too strict 
to countenance forms of justification based on matters of cost, effi- 
ciency, safety, style, aesthetic judgment, probabilistic considera- 
tions, and the like, justifications that arise at almost every turn in 
real-world problem solving, see for example [Mitchell, 1990]. A 
model of heterogeneous rationales is particularly important for col- 
laborative reasoning, where a consensus must be achieved in spite of 
competing justifications (say aesthetic versus structural) of divergent 
decisions. 

• Heterogeneity of goals. Historically, logic has focused almost ex- 
clusively on a single type of reasoning in which the goal is to show 
that a conclusion is a necessary consequence of some given informa- 
tion. In real-world problem solving, by contrast, the goals of a par- 
ticular reasoning task can be quite varied, see for example [Barwise 
& Etchemendy, 1994]. Design problems typically admit of a wide 
variety of solutions that meet the primary specifications, and selec- 
tion among competing solutions is made on the basis of subsidiary, 
comparative criteria. The primary goal is thus to find any solution 
that satisfies certain requirements, not a unique solution entailed by 
those requirements, and secondary goals guide the choice among 
candidate solutions to the primary goal. For example, in the design 
of a circuit, the primary goal will be a correct circuit, but other crite- 
ria distinguish between competing correct circuits, for example, the 
expense of fabricating the circuit in silicon. Research in logic has 
historically not addressed reasoning with such complex goal struc- 
tures. 

Our computational architecture for heterogeneous reasoning (CAHR) al- 
lows each of these kinds of heterogeneity to be represented within a struc- 
tured document encoding a piece of reasoning. This chapter is an extended 
version of [Barker-Plummer & Etchemendy, 2003]. 



2. SENTENTIAL NATURAL DEDUCTION 

We take the traditional model of formal logic known as natural deduction 
[Fitch, 1952; Gentzen, 1935; Prawitz, 1965; Prawitz, 1971] as our starting 
point. An example of a proof constructed using this model is presented in 
Figure 1. The particular presentation of natural deduction that we use is due 
to the logician Fitch [Fitch, 1952]. 



82 



Chapter 4 



In order to maintain an appropriate flow of information, a reasoning ap- 
plication maintains a data structure representing the proof. A proof is more 
than an unstructured collection of information-bearing representations but 
rather records a structured reasoning process that represents a solution to a 
reasoning task. It is important to note that the structure of a proof does not 
represent the temporal structure of the reasoning, but rather the underlying 
logical structure of the reasoning. 



> 1 . vx (Dodec(x) 3y Adjolns(x, y)) 
1 vx Dodec(x) 



3:1.1 

3:1.2 Dodec(a) 

3:1.3 Dodec(a) ^ 3y Adjoins(a, y) 

3:1.4 3y Adjoins(a, y) 

3:1.50.1 Adjoins(a, b) 

3:1.5:10 Dodec(b) 

3:1.5:10 Dodec(b) ^ 3y Adjoitis(b, y) 
. j j 3:i.5:i.4 3y Adjoins(b, y) 



✓ V Elim: 2 

✓ V Elim: i 

✓ Elim: 3:10, MO 

✓ V Elim: 2 

✓ V Elim: 1 

✓ Elim: 3:13:10,3:10:10 



3:13:13:1 AdjOlns(b, C) 

3:1.503 3:13030 AdjOins(a, b) A AdjOins(b, c) ✓ A Intro: 3:1.5:1.10:1.S:1.5:1 

3:13:130 3y 3z (Adjoins(a, y) a Adjoins(y, z)) ✓ 3 Intro: 3:i.5:i.50 

3:1.5:13 3y 3z (Adjoins(a, y) A Adjoins(y, z)) ✓ 3 Elim: 3.i:S.io,3:i3:i.4 

3 : 1,6 3y 3z (Adjoins(a, y) A Adjoins(y, z)) ✓ 3 Elim: 3:i3,3:i.4 

4. vx3y 3z (Adjoins(x, y) A Adjolns(y, z)) ✓ "vv Intro: 3 



Figure 1. A Sentential natural deduction proof 



A (sentential) proof is a recursive structure consisting of an ordered se- 
quence of nodes. Each node contains either: 

1 . a sentence (which expresses information asserted at that node), illus- 
trated in Figure 1 at steps 3:1.2 and 3:1.3, for example. 

2. a nonempty set of proofs (called “subproofs” or “cases” of the parent 
proof). This situation is illustrated in step 3 of Figure 1. This step 
contains one subproof, consisting of steps 3: 1.1 -3: 1.6. 

The initial node of a proof (or subproof) is called the assumption step. A 
small horizontal tick in the left margin serves to separate the assumption step 
from the remainder of the proof. In our presentation, the main (outer) proof 
can have several assumption steps, though this is a syntactic convenience. 

With the exception of assumption steps, each step in a sentential proof 
must be justified by the use of in inference rule and citation of support. To- 
gether the rule and support provide an unambiguous justification for the de- 
duction of the new sentence from the support steps. 

Nodes in a proof may optionally be numbered for ease of reference. One 
possible numbering scheme that captures the recursive structure of a proof is 
illustrated in Figure 1. 
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The steps in each subproof are numbered sequentially, tagged with an 
identifier for the subproof A subproofs identifier is that of the step in 
which it is contained, appended with the sequence number of this subproofs 
case within that step. The tag 3:1.6, near the bottom of Figure 1 should be 
read (backward) as naming the sixth step in the first subproof of the third 
step (of the main proof). 



2.1 Inference 



The sentences occupying the atomic steps of the example proof are writ- 
ten in a formal language known as first-order logic (FOL). This language 
has an unambiguous syntax and semantics. The sentence Vx Dodec(x) in 
step two for example, means that every object (in the domain) is a dodeca- 
hedron.' 

The use of a justification is illustrated at step 3:1.2. The inference rule 
used here is known as V-Elim (universal elimination). In this step the sup- 
port is the sentence appearing in step 2 of the proof The inference here is 
from the universal claim that every object is a dodecahedron, to the particu- 
lar claim that a is a dodecahedron, where a is some object in the domain of 
the problem. 

The fact that the claim at step 3:1.2 may be deduced from the claim at 
step 2 using the V-Elim inference rule can be checked syntactically. To do 
this we note that the support is a universally quantified sentence, and that the 
justified sentences is the matrix (body) of the sentence, with the bound vari- 
able X replaced with a name a. This is a correct application of the rule ac- 
cording to the definition of the system. 

The number and nature of the available inference rules will depend on the 
particular logical system. In the case of logical deduction, the key require- 
ment is that each inference rule must be stated in completely unambiguous 
terms, and it should be clear when the conditions on the correct application 
of the inference rule have been met. 

2.2 Support and the structure of a proof 

In order to ensure logical validity, we may only use information from 
earlier in the proof as support for a later step. The notion of “earlier in the 
proof’ is made precise by the following observations. 



' Under our conventional interpretation of the predicate Dodec. 



84 



Chapter 4 



A node is accessible from a second node, and hence available as a poten- 
tial support for it, if either a) both nodes are in the same (sub)proof and the 
first strictly precedes the second, or b) the second node is in a subproof that 
is contained by a node from which the first node is accessible (either directly 
or recursively in virtue of this clause). When node A is accessible from node 
B, we write A < B. Note in particular that a node in a subproof is not acces- 
sible from any node in its parent proof (or in any of its ancestors), nor from 
nodes in sibling subproofs, that is, subproofs with the same parent proof 
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Figure 2. The accessibility relation 



On the left of Figure 2, we indicate in black all the nodes accessible from 
node 9.2:4.2:2, which is shown in gray. On the right of the same figure, we 
indicate in black all nodes accessible from node 9.2:5. Flere, notice that 
node 9.2:4, which contains a range of cases (subproofs), is accessible from 
node 9.2:5, but that the nodes within those cases are not accessible from this 
node. 

The accessibility relation determines the inheritance of information in a 
proof That is, A < B, then the information expressed by representations 
contained at node A is also available at node B. 

It is important to note that although the nodes of a proof form a partial 
(not linear) order, when we restrict attention to the nodes accessible from a 
specified node, these nodes are linearly ordered. This may be emphasized 
by renumbering the nodes in Figure 2 in the manner shown in Figure 3. 
Thus, each node in a proof has a unique, linear history which we call the 
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node's provenance. The information available at a node of a proof is the 
sum of the information expressed at nodes in N’?, provenance. 
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Figure 3. The provenance of a node 



The articulation of a proof into nodes, proofs, and subproofs allows us to 
isolate two senses in which information is present at a given stage of reason- 
ing. On the one hand, each node in a proof typically adds new information 
potentially relevant to the solution of the reasoning task in question. This 
additional information may be presented by a new sentence, or by a set of 
cases attached to the node. These cases are interpreted disjunctively, that is, 
as representing a range of possible alternatives. We call the information 
contained in a single step the incremental information associated with the 
node. The justification mechanism allows the user to explain and justify the 
incremental information introduced at a node. But this incremental informa- 
tion is not the only information available at that stage in the reasoning. The 
accessibility relation defines the paths of legitimate information inheritance 
through the proof At any point in the reasoning, the total information state 
is defined by the totality of incremental information accumulated along the 
nodes in the current node's provenance. We call this the cumulative informa- 
tion associated with the node. 

Because the cumulative information available at a node is represented by 
the provenance of that node, any step contained in the provenance is avail- 
able for citation as support for an inference at that node. 
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3. GENERALIZING TO HETEROGENEOUS 
DEDUCTION 



With the standard model of natural deduction in hand, we turn our atten- 
tion to the task of formalizing the more general domain of heterogeneous 
deduction. Figure 4 illustrates a heterogeneous deduction. 
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Figure 4. An office assignment problem 



The reasoning recorded concerns the assignment of offices, represented 
diagrammatically in the proof, based on a variety of constraints that are ex- 
pressed sententially. The reasoning shows that there is only one possible as- 







4. Representing visual decision making 



87 



signment that satisfies the stated constraints.^ In this section, our intent is to 
consider the generalizations to the model of sentential deduction that are 
necessary to model the heterogeneous case. 

The first thing to notice about Figure 4 is that the structure of the reason- 
ing is very similar to that of a typical natural deduction proof The reasoning 
proceeds by splitting into cases represented by subproofs each headed with a 
new assumption, and then elucidating the consequences of those assump- 
tions within the subproof At the end of a subproof, information is exported 
to the main proof according to some inference rule. 

The obvious difference between the two cases is that the individual steps 
of the proof contains diagrams, rather than sentences. In fact, in the example 
of Figure 4 all of the steps in the proof except the initial assumptions contain 
diagrams, though in general some steps might contain sentences. 



3.1 Graphical representations 

The presence of graphical representations in a proof requires special 
techniques for properly handling information inheritance for graphically dis- 
played information, as well as appropriate editing restrictions that respect the 
inheritance structure of the proof 

In order to describe these techniques and restrictions, we introduce some 
terminology. Every graphical representation is introduced into a proof at a 
specific node, which we call the node of origin of the representation. This 
can but need not be the first node of the proof A graphical representation G 
introduced at node A can be modified at any subsequent node in the proof 
from which A is accessible, subject to certain editing constraints described 
below. In general, the modification of a graphical representation G at node 
B does not affect the display of G at earlier nodes. Thus, a single graphical 
representation will present different displays at different nodes of a proof 
The specific display of a graphical representation at a particular node is 
called a graphic, and the graphic is an instance of the graphical representa- 
tion. If graphic Ga at node A and graphic Gb at node B are instances of the 
same graphical representation G, and A < B, then Ga is said to be an ances- 
tor of Gb, and Gb is said to be a descendant of Ga- 

Being an instance of the same graphical representation is an equivalence 
relation among graphics, i.e., it is reflexive, symmetric and transitive. In 



^ The reasoning of Figure 4 is very similar to that implemented in our Hyperproof program 
[Barwise & Etchemendy, 1994] with the key difference being that the kind of diagram 
used in this proof (Hyperproof uses diagrams of the placement of objects on a checker- 
board.) 
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particular, notice that two graphics, neither of which is at a node accessible 
to the other, can still be instances of the same graphical representation. This 
will happen if (and only if) both graphics result from modifying a common 
graphic appearing at a node accessible to both. If Ga and Gb are instances of 
the same graphical representation G but neither A nor B is accessible from 
the other, then Ga and Gb are said to be cousins. 

The left of Figure 5 illustrates a simple proof structure containing a single 
graphical representation. The node of origin of this graphical representation 
is node 1 . All of the graphics in the proof are descendants of Gi, except for 
Gi itself Graphic G2A-.2 is also a descendant of graphic G2.i:i, and graphic 
G2.2:2 is a descendant of graphic G2.2:i- Graphics G2.i;i and G2.i:2 are cousins 
of graphics G2.2:i and G2.2:2- Graphic G3 is a descendant of Gi and a cousin 
of the remaining graphics in the proof 




Figure 5. Availability of information in a proof 

If a graphical representation G is introduced at node A, then we say that 
G is available at A and at any node from which A is accessible. If A is not 
accessible from node B, then we say that G is not available at B. A repre- 
sentation G can be displayed and edited at a node if and only if it is available 
at that node. On the right of Figure 5 , we illustrate the difference in avail- 
ability of two graphical representations appearing in a proof The first 
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graphical representation, marked a, is introduced at node 1, and is conse- 
quently available at every node in the proof (since node 1 is accessible from 
every node in the proof). The second representation, marked b, is introduced 
at node 2.1:1. It is available throughout subproof 2.1 (including the sub- 
proofs of this subproof), but not at any other node in the proof Thus, no 
instance of this representation appears at nodes 1, 2.2:1, 2.2:2, or 3, and it 
cannot be modified at these nodes. 



3.2 Display and editing of graphical representations 

A basic insight underlying our architecture is that sentential and graphical 
reasoning display similar structural features, features that can be captured by 
means of the proof structure and accessibility relation defined earlier. 
Where these characteristic types of reasoning differ is in how incremental 
and cumulative information must be handled. With sentential reasoning, it is 
generally possible to express by means of a single sentence the precise in- 
cremental information added at any point in the reasoning. This incremental 
information can thus be isolated from the cumulative (sentential) information 
inherited at that node, since sentences expressing the inherited information 
remain at earlier nodes. They are still accessible from the current node, but 
are isolated from the incremental information by virtue of their location in 
the proof 

In contrast, when reasoning employs a graphical representation, incre- 
mental information in typically expressed by means of modifications of the 
representation. But these modifications are specified or defined in relation 
to other information-bearing features of the representation. For example, if 
we add a mark indicating a location to an existing map of a city, both the 
graphical modification and the incremental information it conveys presup- 
pose the presence of the existing features of the map. For this reason, it is 
not possible to graphically isolate the incremental information from the cu- 
mulative information inherited from earlier stages in the reasoning. The in- 
cremental information, which includes, for example, the distance from the 
new point to any other location in the city, is distributed throughout the 
graphic. The CAHR architecture handles these characteristics of graphical 
representations by means of special techniques governing the display and 
modification of graphical representations in a proof 

An implementation of CAHR keeps track of where (at which node) 
modifications to a graphical representation are made in a proof Generally 
speaking, the location of a modification has an impact both on where the 
modification is displayed and on what subsequent modifications can be 
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made to the representation at later nodes in the proof. The basie intuition is 
that since a modification at a particular node introduces incremental 
information, that information is inherited at later nodes. And since 
inheritance of graphically expressed information is explicit, subsequent 
modifications to the representation should preserve that information. Thus, 
the information content of a graphical representation should increase 
monotonically as we follow any accessibility path through a proof 

This section describes two types of graphical editing that may be permit- 
ted in a CAHR-based application enforcing this flow of information: incre- 
mental and presentational editing. 

3.3 Incremental editing 

Incremental editing is used for modifications that conform to the inheri- 
tance stmcture of the proof, and hence is the appropriate edit type to use 
when making informationally significant changes to a graphical representa- 
tion. When an incremental edit is made to a graphical representation G at 
node A of a proof, the modification is displayed in the graphic Gn and in all 
descendants of G^, until the modification is superseded by changes made at 
subsequent nodes. If a subsequent edit superseding the change made at N is 
made at node L > N, the modification made at N is displayed in the graphic 
Gat and in any graphic Gm where L > M > N. Incremental edits made to Gjv 
have no effect on the ancestors or cousins of Gn- 

The office assignment example of Figure 4 illustrates the notion of in- 
cremental editing. As we move through the proof, the (tentative) assignment 
of an individual to an office is represented by the placement of the appropri- 
ate letter in an appropriate location on the diagram. For example, at step 
5.2:1, Barbara is assumed to be assigned the large office. In each of the sub- 
proofs contained in step 5.2:2 this information is inherited and represented 
for the duration of the subproof Only when the scope of this subproof is 
closed at step 6 is the information not displayed at that node (because 5.2:2 
is not accessible from step 6.) 

Note that editing a graphic at a node does not impose additional editing 
constraints on the graphic at that same node, but rather on descendant graph- 
ics. Indeed, anything it is possible to do at a particular graphic, it should be 
possible to “undo” or take back at that same graphic. It is only when the 
user moves to subsequent nodes that she commits to the modifications made 
at the earlier node. Or, to put it another way, at any given node the user is 
freely allowed to modify the incremental information at that node, but must 
respect any constraints imposed by the graphically displayed information 
inherited at that node. 
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The example has an important feature, namely that each edit is permanent 
for the duration of its scope. Once an individual has been assigned to an of- 
fice, this assignment is not subject to further modification. We call such ed- 
its permanent incremental edits. Permanent incremental editing can be ef- 
fectively used to enforce the monotonicity requirement discussed above. If a 
piece of information has been established at a particular node in a proof, 
then that information is available at any node from which the node in ques- 
tion is accessible. If that information is recorded as a modification to a 
graphic, then the specific modification must be inherited by all descendant 
graphics. Permanent incremental editing guarantees that this will be the 
case, since the modification cannot be subsequently altered at a later (de- 
scendant) node. Permanent edits can, however, be retracted or modified at a 
later time, but the modification must be made at or before the node in which 
the original edit was made. This imposes a practical restriction that aids us- 
ers in the successful completion of the reasoning task: A conclusion or deci- 
sion that was previously made cannot be retracted without navigating to the 
node associated with that modification and reviewing the original justifica- 
tion. This restriction becomes particularly important if the proof is being 
constructed by multiple users or if the construction takes place over an ex- 
tended period of time. 

Permanent incremental editing is a species of a more general notion that 
we call constrained incremental editing. A modification to a graphic Gn is a 
constrained incremental edit if, in addition to modifying Gn, it narrows the 
range of possible modifications that can be made to descendant graphics. 
Permanent editing is the most highly constrained form of incremental edit- 
ing. 

Figure 6 illustrates constrained incremental editing. We assume a 
graphical representation of a single attribute that can be given one of seven 
possible values. 




Constrained incremental edit Information semilattice 

Figure 6. Constrained incremental editing 
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We have assumed that the values have fixed eonventional interpretations, 
and that the relative significance of the values is expressed by the informa- 
tion semilattice shown on the right of the figure. The value T is a “null” 
value, consistent with any possible assignment to the attribute, while the val- 
ues X, Y, and Z are maximally informative and mutually incompatible. The 
values [/, V, and IT are intermediate values: C is consistent with either X or 
Y; V is consistent with X or Z; and fF is consistent with Y or Z. The proof 
shown on the left depicts incremental edits made at nodes 2 and 4. 

The value assigned to the attribute at each node is shown in the square, 
and the square is annotated with the set of permissible values that can be as- 
signed at that node. At node 2, the value of the attribute is changed from T 
to V. This increments the information contained at the node, and hence sub- 
sequent changes to the graphical representation are constrained to those that 
express further refinements of that information. At node 4, the value is 
changed from V to X, again incrementing the information provided by the 
graphic. Since this value is maximally informative, no further modifications 
are allowed at node 5. 



3.4 Post-editing 

Since the temporal order in which reasoning occurs rarely conforms to 
the logical structure of the reasoning, it is necessary for any CAHR based 
application to permit the editing of the proof stmcture, as well as any repre- 
sentation, after its initial construction. However, both forms of “post- 
editing” must respect the display and editing constraints imposed by incre- 
mental edits made to the graphical representations contained in the proof 

Since the display of graphics in a proof is determined by the original 
graphic plus a sequence of incremental “deltas,” the deletion of a node in an 
existing proof can change the graphics displayed at subsequent nodes. Fig- 
ure 7 illustrates the impact of deleting a node containing modifications to a 
graphical representation. In this figure, we examine the effect of deleting 
node 2 from the proof of Figure 6. After the deletion, where the value V was 
assigned, descendant graphics no longer display the effects of this modifica- 
tion. Thus at node 3, the attribute retains the value T, until the modification 
at node 4. 

Post-editing a graphic located at a node can alter subsequent graphics in 
two distinct ways. On the one hand, changing the existing value of an attrib- 
ute will have the expected effect on how the attribute is displayed in descen- 
dant graphics: the new value is displayed until superseded by a different 
value assigned in a descendant graphic. An example of this sort of post- 
editing is depicted in the center of Figure 8, where the proof on the left is 
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modified by changing the value of the attribute at node 2 from V to U. The 
new value is then inherited by descendant graphics until it is subsequently 
changed to X. 
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Figure 7. Deleting a node 



On the other hand, the modification of an existing value can alter the 
editing constraints imposed on descendant graphics, and so render an edit 
impermissible that was formerly permitted. An example is depicted on the 
right of Figure 8, where the value at node 2 of the proof on the left is 
changed from V io W. A consequence of this change is that the range of 
permissible values at node 4 no longer includes the value X. Thus the 
change at node 2 precludes the assignment of X at node 4, and the latter as- 
signment is wiped out. In situations of this sort, when a modification pre- 
cludes an edit already made at a descendant node, a CAHR-based applica- 
tion will typically warn the user of the effects of the modification. 
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3.5 Heterogeneous inference rules 

It is widely assumed that with the introduction of graphical representa- 
tions into a proof comes a necessary informality in the reasoning that can be 
carried out. Our Hyperproof application, [Barwise & Etchemendy, 1994], 
demonstrates that this is not in general the case. Provided that the non- 
sentential representation has a sufficiently well-defined semantics and that 
there is some overlap between the expressive power of the graphical and 
sentential representations participating in a heterogeneous system, it is pos- 
sible to implement a completely formal heterogeneous deduction system. 

Consider the office assignment example of Figure 4. In this figure the 
sentential information has been expressed in the English language, for ease 
of presentation. We trust that the semantics of these sentences is sufficiently 
clear that the possibility of formalizing these sentences and the inferences 
contained within is apparent. 

An example of a heterogeneous deduction rule is the Apply rule used to 
justify several steps in the proof illustrated in Figure 4. In these examples, 
information expressed in sentential form is “applied” to the graphical repre- 
sentation; that is, information is used to justify specific modifications of that 
representation. For example, at step 5.1:2 we apply the sentence at step 3 to 
the diagram at step 5.1:1. There is a unique way to update the diagram at 
step 5.1:1 consistent with the information at step 3, and so provided the se- 
mantics of the two representations were explicitly elucidated this inference 
could be formalized and automatically validated. 

Another heterogeneous rule is used at the last step of subproof 5.2:2. 1 of 
Figure 4. Here we “close” a subproof on the basis that the information 
available is contradictory. In sentential reasoning, this is standardly achieved 
by deriving a sentence and its negation, but here we are asserting that the 
diagram in step 5.2:1. 1:2 contradicts the sentence at step 3 of the proof, since 
Charles and Alan are depicted as occupying adjoining offices. 

A final example of heterogeneous deduction is given by the splitting into 
cases and subsequent merging of those cases. In sentential reasoning, a dis- 
junctive sentence is usually used to demonstrate that a collection of sub- 
proofs exhausts the range of possible situations. For example, a sentence 
might assert that a particular object is either a tetrahedron or a dodecahedron. 
To utilize this fact we would construct two subproofs, one (sententially) as- 
suming that the object is a dodecahedron, and the other assuming that it is a 
tetrahedron. If it is possible to reach a conclusion common to both of these 
cases, then that conclusion can be promoted out of the subproofs into the 
containing proof 

In standard first-order logic, the splitting and promotion is achieved by 
the use of a single inference rule called disjunction elimination. However, 
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we can isolate two facets to the rule. One is the production of an range of 
cases, which must exhaust all of the alternatives represented by the disjunc- 
tion, and then second is the promotion rule which permits the extraction of 
information common to the subproofs into the containing proof We can 
generalize these observations to the heterogeneous case. 

Notice that in Figure 4 the individual subproofs of step 5 are headed not 
by sentences but by diagrams, each of which represents a different assign- 
ment to the large office. If we are to extract information from these cases we 
need to demonstrate that these cases are exhaustive, as indeed they are given 
the sentence in step 2. Since diagrams are typically bad at expressing dis- 
junctive information, it is not unusual for a sentence to be the source of the 
justification that a set of cases is exhaustive. 

At step 6 of the example of Figure 4 we have used an inference rule 
called Merge to promote information common to the cases into the main 
proof In the sentential case the analog is to establish a sentence common 
to all subproofs, and to promote this sentence to the containing proof (on the 
basis that since the cases exhaust all possibilities, and each case entails this 
information, then it must be valid to assert the information outside of any 
specific case.) In the diagrammatic case, we could insist that each case con- 
tains the same diagram, and that this diagram is promoted, but to be more 
general we can allow the promotion of the diagram that contains the infor- 
mation common to diagrams in all subproofs. 



4. GENERALIZING TO HETEROGENEOUS 
REASONING 

Figures 9 and 10 represent examples of the kind of reasoning that we 
hope to capture using applications based on the full CAHR. 

Figure 9 shows a simple proof prepared by an architect designing an ad- 
dition to an existing house (shown at step 1). The client wants to add a guest 
bedroom and bath; the architect's proposed solution is shown at step 3. This 
solution is the result of a structured reasoning process that is recorded in the 
proof The proof is a record of the rationale for the decisions incorporated 
into the final design, and can be used to present the reasoning to colleagues 
and clients. 

Figure 10 illustrates an application of CAHR to the very different domain 
of financial planning. In this example, the user is deciding between a mort- 
gage at 7.5% interest, with two points, and one at 8% interest, with no 
points. The user envisages two salient scenarios: one in which she is trans- 
ferred out of town in five years; the second where she keeps the house for 
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ten years. The reasoning shows that under either scenario, the optimal choice 
is the former mortgage. 
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We note that in neither of these examples is the user of the application 
performing deductive reasoning, but rather engaging in a kind of informal 
reasoning which is appropriate for these domain. In neither case, for exam- 
ple, does the user consider a provably exhaustive set of alternatives, and then 
reason to a conclusion necessitated by consideration of these alternatives. In 
these applications we are not using a CAHR-based application to provide 
any kind of logical certainty, but rather to simply structure and record the 
reasoning that is being carried out in the search for a an acceptable solution 
for our task. We believe that such reasoning is the rule rather than the ex- 
ception in real-world reasoning situations. 

Our CAHR allows the construction of applications that support this kind 
of reasoning. Systems of graphical representation vary widely in the extent 
to which they have fixed or intended interpretations. For example, circuit 
diagrams, architectural drawings, and perc charts have highly conventional 
interpretations, while the representations created in free-form drawing pro- 
grams have no fixed interpretation, yet can be employed to represent a wide 
range of contents. Because of this, implementations of CAHR will differ in 
the extent to which they can antecedently recognize which modifications of a 
graphical representation are intended by the user to be informationally sig- 
nificant. Hence, implementations will differ in the extent to which they rely 
on the user to guarantee monotonicity versus imposing editing constraints 
whose goal is to enforce it. 

In the example of Figure 9 changes to the plan are made by a series of in- 
cremental edits that are consistent with the structure of the proof Within 
each subproof the information is accumulated on the diagrammatic represen- 
tation. However, when using a diagrammatic representation which lacks a 
good specification of information bearing modifications to the diagram, or to 
provide the most flexible tool possible for a particular domain, we introduce 
the notion of an unconstrained incremental edit. 

Unconstrained incremental edits are similar to the constrained incre- 
mental edits introduced earlier, but such an edit made to graphic Gn places 
no additional constraints on the modifications that can be made at descen- 
dant graphics. Thus, if we think of an arbitrary attribute of the graphic Gn as 
having both a displayed value and a range of permissible values (those that 
can be assigned to that attribute at Gn), then an unconstrained incremental 
edit to Gn modifies the displayed values of one or more attributes in Gn (and 
its descendants), but does not change the range of permissible values associ- 
ated with any attributes of descendant graphics. In particular, we note that 
the edit could be undone without returning to the node at which it was made, 
since the information represented before the edit (whatever that was) is still 
among the range of permissible values. 
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In Figure 11 we contrast the effects of an unconstrained incremental edit 
and a permanent incremental edit made at the same node. In both proofs an 
incremental edit is made at node 3, where the value of the attribute is 
changed from T to X. Where the permanent edit differs from the uncon- 
strained edit is in its effect on the range of permissible values of the attribute 
at subsequent nodes. In the proof on the right, the possible values reduce to 
X in the nodes following the one at which the change is made. Thus in the 
proof on the left, the user would be permitted to further edit the value of the 
attribute at nodes 4 and 5; in the proof on the right, the value is fixed at these 
nodes. 
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Figure 11. Constrained and unconstrained editing 



Some implementations will allow unconstrained incremental editing, ei- 
ther because the graphical representations they employ do not have suffi- 
ciently well-defined interpretations capable of supporting an antecedently 
specified system of editing constraints, or because the nature of the reason- 
ing tasks targeted by the application require the use of unconstrained editing. 
This is frequently the case when the reasoning supported has a temporal or 
planning dimension. In such applications, a piece of reasoning may begin 
with a graphical representation of the current state of the subject matter in 
question (e.g., the current design of a product), and proceed to reason about 
possible modifications of that state (design). In such cases, any feature of 
the current state that is potentially subject to modification in the solution will 
be represented by an unconstrained incremental edit in the initial graphic, 
while changes to the design can appropriately be made with constrained (or 
permanent) incremental edits. Applications that allow both permanent and 
unconstrained incremental editing may provide devices for distinguishing 
features of a graphic that are permanent from those that are not, and may 
also provide functions that enable users to “lock” a previously unconstrained 
incremental edit at a node, that is, make the feature behave like a permanent 
incremental edit made at that node. 
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4.1 Presentational editing 

A significant component of the value of a CAHR-based application for 
reasoning is that the record produced serves as a kind of documentation for 
other stakeholders in the reasoning process. For example in the case of ar- 
chitectural design there may be many designers on a project, each specializ- 
ing in a different aspect of the design, there will be a client with interest in 
the justifications of decisions made, and there will be construction engineers 
with a interest in understanding the design for the purposes of realizing an 
artifact. The use of a common document recording decisions made using 
representations in common use within each community serves to facilitate 
communication between the different stakeholders. 

To allow the most flexible tools possible, an implementation of CAHR 
may allow the user to make certain kinds of modifications whose purpose is 
not to increment the information contained by the graphic in which the 
modification is made, but rather to comment on or make explicit some aspect 
of the reasoning that otherwise might be missed or undervalued. We de- 
scribe two types of modification of this sort: spot edits and backdrop edits. 

A spot edit made to a graphical representation G at node N is displayed 
only in the graphic Gn appearing at N. A spot edit does not affect either the 
display or editing constraints in effect in any other instance of G. The pri- 
mary purpose of spot edits is presentational rather than informational. A 
spot edit is useful for highlighting or annotating particular characteristics of 
the graphic at a specific node. For example, a spot edit might be introduced 
to bring the user's attention to subtle incremental edits made at that node. 
(The gray highlighting in Figures 6, 8 and 1 1 is an example of spot editing.) 

A backdrop edit made to G at node N is displayed at every graphic in the 
proof that is an instance of G, except a) in graphics that contain, or are de- 
scendants of graphics that contain, incremental edits which preclude the dis- 
play of the backdrop edit, or b) in graphics that contain spot edits which pre- 
clude the display of the backdrop edit. 

The value of a backdrop edit can be modified at any node in which it is 
displayed. Thus for purposes of editing and display, a backdrop edit is 
treated like an unconstrained incremental edit made at the node of origin of 
the graphical representation. 

The purpose of backdrop edits is to introduce or change pervasive fea- 
tures of the graphical representation throughout the proof They are appro- 
priate for graphical features that are either meant to carry no significant in- 
formation (e.g., the background color of a diagram or the font in a spread- 
sheet cell) or meant to represent background information presupposed 
throughout the reasoning (e.g., the property lines in an architectural draw- 
ing)- 
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A graphical editor provided by a CAHR-based application may or may 
not support these or other forms of presentational editing, and those that do 
support presentational editing may restrict its use in various ways. For ex- 
ample, an application may restrict spot and backdrop edits to special “layers” 
of the graphic, allowing only incremental editing on the graphic's primary 
layer. 

4.2 Justifications in heterogeneous reasoning 

A CAHR-based application will, in general, provide a collection of rules 
for justifying reasoning steps. The notion here is a generalization of that of 
inference rules in deduction. Applications may generalize these constraints 
for reasons of convenience or intractability. It may not be deemed necessary 
that every move is justified in any way, for example, or justifications may be 
expressed in natural language which may leave them unverifiable by any 
syntactic means. It may also provide a facility for users to introduce their 
own rules for justifying steps. 

An implementation of CAHR can provide various levels of support for 
rules. An application in a domain where it is possible and desirable for all 
reasoning steps to be formally checked, can provide verification support for 
a given rule or rules within the system. Partial verification support is avail- 
able for systems which are not able to verify some but not all uses of the 
rule, for reasons of tractability perhaps. Citation support for a given rule al- 
lows the user to mark a node as justified by the rule and requires the user to 
indicate the supports for the justification, but cannot verify the legitimacy of 
the rule's application. The example of Figure 4 is intended to represent an 
application in which at least citation support is available as indicated by the 
presence of the names of rules and the cited support to the right of each step. 
Finally, an application may omit the justification mechanism entirely, pre- 
serving only the proof structure of the user's reasoning and the ability to 
make unstmctured annotations to the node. 

This variety is intended to make the architecture applicable to a wide 
range of domains. If applied to the realm of architectural design, for exam- 
ple, some design decisions may be made on aesthetic grounds, and any re- 
quirement that this be formally justified would impede the recording of such 
decisions (pending a formal theory of aesthetics.) Other decisions in the 
same domain may be made on the basis of the legal building code. While it 
is presumably possible to determine whether a design meets the building 
code, it may be impossible in practice to implement such a check. Citation 
support at least permits the user to cite the parts of the legal code that, say, 
put a proposed design outside of the code as part of the justification for not 
further considering the design. 



102 



Chapter 4 



For example, the justification at step 2.2:22:2 in Figure 9 presumably 
appeals to the building code in force as part of the justification, which is not, 
but perhaps could be, represented as part of the proof Similarly, the justifi- 
cation at step 3 presumably appeals to some criteria enabling the judgment 
that this is the best solution, for example the specification of the project, the 
budget for the project, or the architects aesthetic sense. In the first two 
cases, it is again possible that this information might be represented within 
the proof structure to make this appeal to them more explicit. 

The rules provided by an application can be either homogeneous or het- 
erogeneous. Homogeneous rules specify legitimate reasoning steps that em- 
ploy a single type of representation. Examples include rules that allow the 
inference of a sentence from other sentences, or the inference of a Venn dia- 
gram from other Venn diagrams. Heterogeneous rules specify legitimate 
reasoning steps that employ or can be applied to more than one type of rep- 
resentation, for example a rule that legitimates the inference of a sentence 
from a Venn diagram (for a discussion of inference with Venn diagrams see 
[Hammer, 1995; Shin, 1995]). Here we single out five important types of 
rules: 

1. Assumption rules: Each proof (including subproofs) contains a (possi- 
bly empty) initial sequence of nodes that provide information assumed 
for the remainder of the proof or subproof These nodes are justified us- 
ing an assumption rule and require no supports. An application may pro- 
vide a variety of assumption rules to be used in different contexts, corre- 
sponding to different types of assumptions. 

The proof illustrated in Figure 10 uses three types of assumption rules, 
which we have called Given, Option, and Scenario. The first marks the 
information assumed in the problem; the second indicates possible 
choices open to the user; and the third allows the user to entertain possi- 
bilities outside her control. Steps justified by means of these different as- 
sumption rules may figure differently in other reasoning steps and proce- 
dures. The Given rule is also used explicitly in Figure 10 and implicitly 
at the first step of Figure 9. 

2. Transfer rules: Transfer rules allow the transfer of information from 
one representation to another. Examples include rules that allow the user 
to express in sentential form information that is present in a graphical 
representation at an earlier accessible node, or rules that allow the modi- 
fication of a graphical representation based upon information expressed 
in sentences at accessible nodes. The Apply rule is an example of a dia- 
gram to sentence transfer rule. 

We note that in the general case, there may be many representations in 
play during a piece of reasoning, and that transfer rules may involve 
transferring information between representations in a many-to-many 
fashion. 
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3. Case rules: As observed above, case rules may be used to justify a node 
that contains a set of subproofs. The most important class of case rules is 
what we call exhaustive cases rules. An exhaustive cases rule allows the 
user to break into a collection of alternatives (“cases”) that are jointly ex- 
haustive (that is, one of which must hold). The cases are specified by the 
initial (assumption) nodes of the subproofs contained by the node being 
justified. Exhaustive cases rules permit cases that are specified by vari- 
ous types of representation (sentential or graphical), and which are sup- 
ported by nodes containing various types of representation. 

Not all case rules will require an exhaustive set of cases. In the finan- 
cial planning example of Figure 10 the user is deciding between two dif- 
ferent mortgage options. In practice there may be very many available 
deals, and in principle they might be considered as a range of exhaustive 
cases, but more likely only representative or extremal examples need be 
considered. 

The complete range of cases may not be available even in principle. 
The architectural example of Figure 9 considers two locations for the ex- 
tension, to the south or to the east of the main house, but (assuming con- 
tinuous space) there is an infinity of possible options, with varying di- 
mensions for the new addition. Again, only extremal or representative 
cases are likely to be considered in practice. 

4. Promotion rules: Promotion rules allow users to extract information 
from a set of exhaustive cases, that is, to promote information contained 
in one or more of the embedded subproofs to a subsequent node in the 
embedding (parent) proof A promotion rule is used to justify a node in 
the parent proof and cites as support an accessible node containing a set 
of subproofs. Logically, promotion rules allow users to extract informa- 
tion from a set of exhaustive cases, that is, to promote information con- 
tained in one or more of the embedded subproofs to a subsequent node in 
the embedding (parent) proof A promotion rule is used to justify a node 
in the parent proof and cites as support an accessible node containing a 
set of subproofs. One important class of promotion rules are what we 
call merge rules: The application of a merge rule is legitimate when the 
information extracted is present in each of the cases not containing a 
“close” declaration at any of its nodes (see below). 

The last steps in Figure 9 illustrates the use of a Merge rule. In this 
case, one of the subproofs has been closed, indicating that the case under 
consideration is inconsistent. The merge rule then promotes the informa- 
tion from the sole remaining case as representing “the way things must 
be”. If there are multiple open cases, the strongest representable infor- 
mation common to all open cases is promoted into the containing proof 

The final step in Figure 10 shows a different kind of promotion rule, 
one in which information is promoted as a consequence of the application 
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of a metric to the various options, in this example the option with the 
minimum interest payment in corresponding scenarios. 

We note that different promotion rules might be applicable depending 
on the nature of the cases that they are acting upon. For example, we 
discussed above the assumption rules Option, which reflects a choice on 
the part of the user, and Scenario, reflecting a forseeable state of the 
world that might arise outside of the user's control. When promoting an 
outcome from a range of cases considering different Options, we should 
use an optimizing promotion rule which preserves the best case. In con- 
trast, it is not obvious whether to use an optimizing, averaging, or pes- 
simizing promotion rule when promoting outcomes from a range of Sce- 
narios. The most conservative approach would be to use a pessimizing 
rule in a manner analogous to the familiar minimax procedure. 

In the case of logical deduction, promotion rules are pessimizing in the 
sense that only information known in all cases can be promoted from a 
range of Assumption cases (and then only if the cases are known to be 
exclusive.) 

Reasoning by cases is fundamentally hypothetical and disjunctive. It 
is hypothetical in that the reasoning within a particular case is based on 
assumptions that need not hold. It is disjunctive in that we are in general 
left with multiple open alternatives when we conclude our consideration 
of the cases. For this reason, promotion rules are essential if our goal is 
to find (and justify) a unique or optimal solution to a reasoning task. 

5. Declaration rules: Declaration rules are used to justify assertions made 
about the state of the proof at the node in question. Examples include 
declarations that a case (subproof) is closed or that a case is consistent 
with the information assumed in the proof Declaration rules specify the 
conditions under which the declarations can be made. 

Several examples of the Close rule are illustrated in Figures 4 and 9. 
These proofs also highlight the variety of reasons for which cases may be 
closed in the course of solving a reasoning task. In Figure 9, cases are 
ruled out for aesthetic and legal reasons; in Figure 4, cases are closed be- 
cause they are inconsistent with the constraints given in the problem. 

Other declarations are possible. It might be necessary to demonstrate 
that the information expressed in a set of representations is consistent, 
when the representations are interpreted conjunctively. For example, 
imagine being presented with a collection of constraints, and a candidate 
solution. The goal of the reasoning is to show that the candidate solution 
is in fact a solution to the problem posed by the constraints. We need to 
conclude the proof with a declaration that the constraints are all satisfied 
in the solution design. 

A CAHR-based application may implement rules explicitly or implicitly. 

By this we mean that the user may or may not be required to understand and 




4. Representing visual decision making 



105 



choose rules from an explicitly presented range of options. For example, an 
application may treat the first node in every proof or subproof as an assump- 
tion; it may treat every node containing a set of subproofs as an exhaustive 
range of cases; and it may contain routines that automatically apply an ap- 
propriate promotion rule in the node immediately following a range of cases. 



5. APPLICATIONS OF THE ARCHITECTURE 

As we have hinted throughout this chapter, we see applications based on 
the CAFIR architecture and being useful in a variety of domains, in particular 
in the realm of design. There are a number of reasons for this. The first is 
that the architecture explicitly supports the use of graphical representations 
which are pervasive in design domains. Because designed artifacts occupy 
space in the world, graphical representations of space such as architectural 
plans, wiring diagrams, circuit diagrams and the like are natural for such 
tasks. Other similarly spatial tasks, such as determining possible foldings of 
complex molecules would also naturally fall into this category (indeed this 
too can be seen as a design task, to design a folding that matches the con- 
straints obtained from experimental data.) 

The design of complex artifacts typically involves teams of designers 
each with a different specialization. For example, the design of a building 
might involve designers concerned with the structure of the building, electri- 
cians concerned with the placement of generators, and the wiring of the 
building, plumbers concerned with routing pipes, landscape architects con- 
cerned with exterior access to the building and so on. Each of these special- 
ists may use a different representation of the artifact appropriate for their 
own interests. Flowever, these specialists do not work in isolation, they are 
collaborating to build a single artifact, and consequently, the design reason- 
ing will naturally involve all of these representations interpreted conjunc- 
tively. The need for an architecture able to support heterogeneous reasoning 
naturally falls out of this kind of collaborative task. 

We think that one of the most exciting aspects of this architecture is the 
promise of producing documents that record the rationale for the decisions 
made in the process of producing designs of complex artifacts, or more gen- 
erally of complicated decision making processes. Documents constructed 
using this architecture will allow designers to replay their reasoning for col- 
leagues, clients and for themselves at a later date. Rather than presenting a 
client with a proposed design for the extension to their house, an architect 
would be able to replay the reasoning resulting in that design. If the client or 
a colleague were to disagree with the justification associated with a step, the 
reasoning at that step might be reconsidered. For example, the architect may 
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judge a candidate design unacceptable on various grounds, which are in fact 
acceptable to the client. 

More importantly, the ability to capture the rationale for the design of a 
complex artifact will ease the task of maintaining that artifact over time. 
This is of particular importance when the longevity of the artifact approaches 
that of the team that designed it. If the rationale for the particular decisions 
are lost, then future maintenance becomes problematic since the features of 
the design that must be maintained are not necessarily obvious. 

We recognize that the task of recording the structure of reasoning and 
providing justifications for individual inference steps can interfere with the 
performance of the reasoning task. The degree to which such information is 
demanded by an application based on our architecture will depend on the 
perceived importance of collecting this information for a given reasoning 
task. In any case, user interfaces must be designed to minimize the intru- 
siveness of the process of recording this information about the reasoning 
task. 



6. CONCLUSIONS AND FURTHER WORK 

We have described a computational architecture for heterogeneous rea- 
soning. The architecture is a general framework for building applications to 
support users in heterogeneous reasoning tasks — tasks involving multiple 
representations for information. Of particular interest are graphical and sen- 
tential representations which have markedly different characteristics. Het- 
erogeneous reasoning tasks are very common; indeed, we believe that het- 
erogeneous reasoning tasks are at least as common as homogeneous, senten- 
tial tasks. 

Our architecture has been developed by consideration of the generaliza- 
tions necessary to the notion of formal reasoning represented by natural de- 
duction proofs. In our architecture a proof, represents a (possibly incom- 
plete) piece of reasoning. Proofs are recursively structured, whose basic 
elements are nodes each of which may contain instances of information- 
bearing representations. Central to the architecture is an analysis of how 
information flows through proofs, and how this constrains the actions avail- 
able to the reasoner at different nodes in the proof 

The architecture does not mandate the specification of justifications or 
the manner in which they are to be collected if their use is supported. We 
believe that the degree to which specifying and collecting justifications for 
reasoning is critical varies from domain-to-domain, and that the correspond- 
ing degree of intrusion into the reasoner’ s process that is warranted by the 
desire to collect justifications varies proportionally. As a consequence it is 
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up to the individual implementer to design appropriate user interfaces for 
eliciting relevant information during the reasoning process, in a maimer that 
is sensitive to the domain at hand.. 

We believe that our architecture may be useful in a wide range of appli- 
cation domains, including but not limited to design of complex artifacts. 
Proofs may be viewed as means for capturing rationales involved in the solu- 
tion of complex reasoning problems, since they record not only the eventual 
outcome of the reasoning, but also the history of the decisions made in pro- 
ducing that outcome. 

We are currently implementing our architecture, as a Java application 
framework called Openproof when implemented the framework will accept 
implementations of editors and inference engines for particular representa- 
tions as plug-ins. By providing combinations of such plug-ins, we hope to 
facilitate the development of arbitrary heterogeneous reasoning environ- 
ments. 



7. EXERCISES AND PROBLEMS 

1. Using the CAHR representation, prepare a heterogeneous proof sketch 
demonstrating the solution to the following problem. Your sketch should 
resemble that of Figure 4, with plausible justification rules of your own 
devising. For each such rule describe clearly the criteria for it successful 
application, and classify the rule according to our five types of rule de- 
scribed earlier. 

Flarry and his wife Flarriet gave a dinner party. They invited Harry’s 
brother, Barry, and Barry’s wife Barbara. They also invited Harry’s sis- 
ter, Samantha, and her husband Samuel. Finally, they invited Nathan and 
Nathan’s wife Natalie. While they were seated around the table, one per- 
son shot another. 





□ □□ 


K 

□ 


□ 






□ □□ 





Figure 12. 
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The chairs were arranged as shown in the diagram. The killer sat in the 
chair marked K. The victim sat in the chair marked V. Every man sat 
opposite his wife. The host was the only man who sat between two 
women. The host did not sit next to his sister. The hostess did not sit 
next to the host’s brother. The victim was the killer’s former spouse. 

2. Speculate on the costs and benefits of adopting a CAHR-based envi- 
roment for capturing the structure of arguments and rationales for appli- 
cations such as: architectural design of an individual residence, planning 
a mission to Mars, designing a CPU chip for the next generation of com- 
puter, designing a nuclear power station, developing a survey instrument, 
which will be reused for over forty years, for collecting complex eco- 
nomic data. What features would a task have to make the adoption of 
CAHR most necessary and/or desirable? 

3. Formal deduction, in mathematics for example, provides logical certainty 
of results that are deduced. By allowing arbitrary justifications that are 
not formally verifiable by software, the CAFlR-architecture has sacrificed 
this certainty. What (if anything) has been gained in its place? What 
would be lost if one were to insist on the formal verifiability of all justifi- 
cations? Would mathematical practice be representable in such a system, 
for example could Wiles’ proof of Fermat’s theorem be so represented 
(in principle and in practice)? 

4. When working with a patient a doctor typically describes only the diag- 
nosis reached and actions taken, and not the alternatives that were con- 
sidered and rejected. To what extent do you as patient care about these 
alternatives? What stakeholders might benefit from a complete record of 
the doctor’s reasoning and which would not? What social forces would 
act to encourage and resist the widespread adoption of CAFlR-based ra- 
tionale capture if it were available in the medical domain? 
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ALGEBRAIC VISUAL SYMBOLSM FOR 
PROBEM SOLVING 

leonine equations from Diophantus to the present 
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Abstract: This chapter provides a discussion of mathematical visual symbolism for prob- 

lem solving based on the algebraic approach. It is formulated as lessons that 
can be learned from history. The visual formalism is contrasted with text 
through the history of algebra beginning with Diophantus’ contribution to al- 
gebraic symbolism nearly 2000 years ago. Along the same lines, it is shown 
that the history of art provides valuable lessons. The evident historical success 
provides a positive indication that similar success can be repeated for modem 
decision-making and analysis tasks. Thus, this chapter presents the lessons 
from history tuned to new formalizations in the form of iconic equations and 
iconic linear prograimning. 

Keywords: Mathematical visual symbolism, problem solving, algebraic symbolism, iconic 

equations, iconic linear programming, iconic algebraic expressions. 



1. VISUAL SYMBOLISM VS. TEXT 

1. 1. Diophantus and beginning of algebraic symbolism 

Mathematicians know two famous visual forms of problem solving: 

• algebraic symbolic reasoning and 

• geometric diagrammatic reasoning 

although typically algebraic formalism is not viewed as a visual reasoning 
approach. In this chapter we show that in fact algebraic reasoning is visual 
reasoning and that it has been very efficient throughout history. Indeed, in 
many cases it is a better choice than textual or geometric reasoning. Inven- 
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lion of algebraic symbolism was critically important for progress in generat- 
ing the algebra necessary for solving linear, quadratic and other equations. 
The reason is obvious — it is almost impossible to solve an equation ex- 
pressed as text. 

A simple modern algebraic expression (12 + 6n)(n^-3) is much longer in 
its textual (verbal, rhetorical) form: “a sixfold number increased by twelve, 
which is multiplied by the difference by which the square of the number ex- 
ceeds three”. The expression can be easily transformed to 6n^-12 n^ + 18n - 
36 using symbols, but it would be extremely difficult to accomplish this us- 
ing text only. 

The history of the invention of algebraic mathematical symbolism is 
quite dramatic and spins out over some 2000 years. It is traced to Babylonian 
mathematics and the “father of algebra,” Greek mathematician Diophantus 
(about 200-284 A.D.). 

Europe was not aware of his “Arithmetica” until 1570, when the book 
written in Greek and preserved by the Arabs was translated into Latin. But 
even this translation, delayed for more than 1000 years, was not published. 
The first translation with known impact on European mathematics is due to 
Bashet, who translated it in 1621. Ferma read it, commented on it, and was 
influenced by Arithmetica [Swift, 1956; O'Connor & Robertson, 1999]. 

It seems that the history of geometric proof was no less dramatic. “There 
are some arguments that it is unlikely that the Greeks could have invented 
their notion of proof so rapidly and in isolation. Instead, it is suggested that 
the notion of geometric proof was a secret that was jealously guarded from 
all but the “inner sanctum” of the Egyptian priesthood” [Altshiller-Court, 
1964]. 

1.2. Mathematics: from verbal algebra to symbolic visual 
algebra 

Developing mathematical symbology progressed in several ways and 
steps over the centuries: 

1 . progressing from text to symbols directly and 

2. progressing from text to symbols by first going to abbreviations and 
then to symbols that may not have direct link to a textual representa- 
tion and 

3. progressing to more and more sophisticated syntax defining how sym- 
bols could be combined. 

Below we illustrate this progress with examples from Diophantus’ Arith- 
metica [Geller, 1998] starting from examples on direct transition from text to 
symbols. Diophantus used: 
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• as a symbol for an unknown the Greek small letter sigma when it is 
written at the end of a word, the symbol differs from the standard 
small sigma a, this sigma q, is called the final small sigma and 

• symbols for numerical coefficients correspond to alphabetic Greek 

numerals (a, (3, y, 8, 8, ... 1, 2, 3, 4, 5, ...). 

Diophantus also used abbreviations progressing from text to symbols. He 
introduced: 

• the symbol for constant term 1 as a capital M with a small circle 
above (this symbol is the abbreviated Greek word monades for 
"units"); 

• the symbol for an "unknown squared" (our modem x^) as the first two 
letters of the Greek word dunamis for "power"; 

• the symbol for an "unknown cubed" (our modem x^) as the first two 
letters of the Greek word kubos for "cube"; 

• the symbol for "minus" originated from the first two letters of the 
Greek word leipis for "lacking." 

Diophantus’ symbolic syntax has several components: 

• the summation symbol is omitted and expressed by sequential writing 
the terms to be added, 

• all negative terms follow the minus symbol, and 

• the power of the unknown precedes the numerical coefficients. 

There are some indirect indications that Diophantus may have been influ- 
enced by the Hindus syncopation that was quite similar to that of Diophantus 
[Geller, 1998]. 

1.3. Lessons from the history of algebra and calculus 

Below we present an extract from the English translation of the real his- 
torical mathematical text, the first book in algebra A.D. 830), “Al-jabr 
wa'l-muqabala” by M. Al-Khwarizmi [English translation: Al-Khwarizmi, 
1974; Parshall, 1988]. The term “algebra’’' came from this book as well as 
the term “algorithm,” which are derived from the name of the author. 

“...a square and 10 roots are equaC to 39 units. 

Ahe question therefore in this type of equation is about asfodows: what is the square 
which combined with ten of its roots widgive a sum totaCof39? 

The manner of soCving this type of equation is to take one-ha[f of the roots just men- 
tioned dfow the roots in the probCem before us are 10. Therefore take 5, which muk- 
tipkied by itsef gives 25, an amount which you add to 39 giving 64. dCaving taken 
then the square root of this which is 8, subtract from it hakf the roots, 5 Ceaving 3. 
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^he numSer three therefore represents one root of this square, which itsehf of course 
is 9. iNine therefore gives the square. ” 

The modem symbolic (iconic) representation and solution is much shorter 
and clearer [Parshall, 1988]: 

Solve + 1 Ox = 39 for x^. 

Solution: + 2-5x +25 = 39 +25; x • x + 2-5x + 25 = 39 + 25 = 64 = 8-8; 

(x +5)(x+5) = 8-8; (x +5)^ = 8-8; x +5 = 8; x = 3, x^ = 9. 

This evident historical success intrigued us to repeat it for modern deci- 
sion-making and analysis tasks. Section 2 demonstrates such a development 
for iconic equations. Tables 1-3 provide another appealing example of the 
advantages of symbolic (iconic) representation. 



Table 1 . Description of linear equations using text and iconic symbology 


Rhetorical, textual representation of a problem 


Compressed text con- 
tent in iconic form 


Three multiplied by an unknown value plus five multiplied by 
another unknown value minus seven is equal to twelve 


3x + 5y~7 = 12 


Six multiplied by an unknown value plus four multiplied by an- 
other unknown value plus two is equal to eight 


6x + 4y + 2 = 8 


Find both unknown values if they exist. 


x,y? 


Table 2. Solution of linear equations in symbolic iconic form 



1. 


2(3x+5>’-7) = 2*12 


6. 


y = 32/6 


2. 


6x+10>’-14 = 24 


7. 


3x = 12+7-5y 


3. 


10y-4>’-14-2 = 16 


8. 


X = (12+7-5y)/3 


4. 


6y-16= 16 


9. 


x = (12+7-5*32/6)73 


5. 


6y = 32; 6x+4y-i-2 = 8; -(6x+4y+2) = -8 






Table 3. Textual solution: first two steps 



1 . Two multiplied by open parenthesis three multiplied by an unknown value plus five mul- 
tiplied by another unknown value minus seven close parentheses is equal to twelve. Six 
multiplied by the same unknown value as in the first sentence plus four multiplied by an- 
other unknown value (that is the same as the second unknown value in the first sentence) 
plus two is equal to eight. 

2. Two multiplied by open parenthesis three multiplied by an unknown value plus five mul- 
tiplied by another unknown value minus seven close parentheses is equal to twelve. Minus 
multiplied by open parentheses six multiplied by the same unknown value as in the first 
sentence plus four multiplied by another unknown value (that is the same as the second un- 
known value in the first sentence) plus two close parenthesis is equal to minus eight. 
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As noted above, it is almost impossible to solve systems of equations us- 
ing words alone. Table 1 gives a linear system of equations. Table 2 gives a 
symbolic, iconic solution. Table 3 shows only the first two steps of the solu- 
tion presented in the textual form. 

The example demonstrates that iconic reasoning for finding a solution is 
much shorter than reasoning that uses only a textual representation of the 
task. Comparison of Tables 2 and 3 highlights the obvious advantages of a 
solution method using symbols/icons. Here it is important to note that the 
symbolic/iconic representation has a much more ambitious goal than merely 
visualizing the task and the solution. It really provides an efficient visual 
solution method. 

Today the advantages of symbolic, iconic representation and reasoning in 
mathematics are obvious, but such was not the case at the time that the 
iconic representations were invented. And it is here that we can learn an im- 
portant lesson from history. The first symbolic, iconic form was rejected. As 
noted at the end of section 1.2, in A.D. 250, Diophantus invented symbolic 
iconic representations [Geller, 1998; Miller, 2001], specifically: 



q, aM, AK^, K'^K 



for the algebraic unknowns that have become the modern symbols 



X, 



X , X , x“ 




For Diophantus the algebraic expression x^ + 2x - 3 would be 

K^aq (3 A M y 

Table 4 derives the equivalence. Not surprisingly, the textual form at the 
bottom is ambiguous and much longer than either the modern or Diophantus’ 
notations. 

It is not clear from the text whether we have: x^+x-2-3 or x^+x-(2-3). To 
avoid this ambiguity the text should be even longer, perhaps: “Cube of un- 
known number plus the same unknown number multiplied by two and minus 
three from the whole expression presiding to three”. In both modern and 
Diophantus’ notations there are 8 characters vs. 130 characters for the text 
based representation. That is, Diophantus’ notation is quite competitive even 
now. 

The advantages of symbolism become even more evident when we try to 
combine two expressions. Let us add x^+x-2-3 and 2x^+3x-2. Diophantus’ 
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notation requires 16 symbols to write down these two expressions before 
eombining them: 

and the result can be produced by adding columns 

K^y(^e A Me 

This is equivalent to 3x^+5x-5 in modem notation. In both cases we have 
still used 8 symbols. 



Table 4. Example of Diophantus’ symbology 



Diophantus’ symbology 
[Geller, 1998; Miller, 2001] 






□ 

My 




Modified Diophantus" nota- 
tion with space for + 




K^a 


A My 




Diophantus notation sliced 
with extra space for + 




a 




i A 


M Y 


Modern algebraic notation 
(sliced) 




1 + 


X 2 


- 


1 3 


Modem algebra (with multi- 
plication sign) 




1 + 


x-2 


- 


1-3 


Modem algebra (without mul- 
tiplication sign) 


x' 


+ 


2x 


- 


3 


Modem algebra 
(compact) 




X 


^+2x-3 




Textual form 


Cube of unknown number plus the same unknown 






number multiplied by two minus three 



In text we would need about 260 symbols before combining expressions. 
Note that text has no a simple positional way of combining components such 
as we used in the symbolic form. 

The next example looks almost like a modern intelligence puzzle. A col- 
lection of epigrams known as the Greek Anthology contained among its 
mathematical problems a puzzle about Diophantus himself [Cohen & Drab- 
kin, 1958;Parshall, 2002]: 

god granted him to he a hoy for the siyth part of his Cife, and adding a tweCfth part 
to this, He clothed his cheeks with down; He [it him the fight ofwedfock^aftera sev- 
enth part, and five years after his marriage He granted him a son. jHas! late-hom 
wretched child; after attaining the measure of half his father's fife, chid Tate took^ 
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him. After consoRng his grief Sy this science of numbers for four years he ended his 

Cife. 

Questions posed based on this text are: When was he married? When did 
he get his first son? When did he die? 

The answer provided is: Diophantus married at age thirty-three, had a son 
when he was thirty-seven, and died when he was eighty-four. 

How can one answer these questions without iconic symbols! It is a real 
challenge using reasoning in a natural language as we have seen in examples 
above, but the solution using iconic notation is simple: 

X / 6 + X / \2 + v/7 + 5+ v/ 2 + 4 = x; 75v/84-v = -9; -9x / 84 = -9; 

V / 84 = 1; V = 84. 

Table 5, based on [Schroeder, 1997; Miller, 2001], presents the trend 
found in later mathematics from using text and moving on to iconic symbol- 
ogy in attempt to rediscover Diophantus’ dormant invention. This trend 
demonstrates that mathematics again went through the same three steps Dio- 
phantus had earlier: 

(1) text ^ (2) abbreviations ^ (3) icons. 

In this way, mathematics was able to reach the highest level of rigorous 
reasoning and problem solving that probably would not have been reached 
without such iconographic knowledge representation or minimally would 
have taken much more effort. 



Table 5. Trend of mathematics from text to iconic symbology 



Time 


Presentation 


Example 




Before 15"’ 
century 


Text 


Al-Khwarizmi: 

Fibonacci: 


“Square and 10 roots are equal to 39 units” 
(modem x2+10x=39) 

“Divide .a. by .b. to obtain .e.” 

(modem a/b=e) 


Renaissance 


Abbreviation 


Cardan: 


“ 1 .quad.aeq. 1 0.pos.p. 1 44” 
(modem x^ = lOx + 144) 


From 17'" 
century 


Icons, symbols 


Modem 


x^ = lOx + 144 



Table 6 details the history of mathematical symbology [Miller, 2001] 
showing what happened after Diophantus’ invention was forgotten. The first 
1200 years were dark years; in 14* century Europe, everything except nu- 
merals were still written out in words. 

In 1463, Benedetto introduced a symbol for the unknown - the Greek 
letter p - rediscovering what Diophantus did more than 1000 years before. 
In 1494, Pacioli used m~ as an icon for minus for the first time. 
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The history of calculus symbology (see Table 7) provides further insight 
into the role of iconic symbology in problem solving. Newton’s symbology 
for derivatives is simpler than Leibniz’s symbology and is well suited for 
functions of one variableX-^) such as velocity and acceleration. 

On the other hand, while it is more complex, Leibniz’s symbology is bet- 
ter suited to functions of two and more variables, /(xi, X 2 , ..., x„). “British 
mathematicians who patriotically used Newton's notation put themselves at a 
disadvantage compared with the continental mathematicians who followed 
Leibniz.” [O'Connor & Robertson, 1997]. Note also that Leibniz’s notation 
abstracts and visualizes a more complex concept. 



Table 6. History of mathematical symbology (based on [Miller, 2001]) 



Year 


Icon/symbol 


Inventor 


1489 


+, - 


Widmann 


1525 


A. 


Rudolff 


1557 


= 


Recorde 


1570 


a, b as known positive numbers 


Cardan 


1580. 


literal coefficients 


Viete 


~ 1600 


division ^ 


Stevin 


1631 


multiplication x, and + 


Oughtred 




less, < ; greater, > 


Harrio 


1637 


x,y,z as unknowns. 

Equation ax + by = c, for positive a, b, c 


Descartes 


1657 


a, b, c as positive and negative numbers 


Hudde 


1698 


Multiplication (■) 


Leibniz 


1734 


> 


Bouguer 


Table 7. Alternative symbology for derivatives 




Newton ’s symbology 


Leibniz ’s symbology 




x,x 


dx/dt 



The important question arising from history is: “How was it possible that, 
in spite of obvious advantages of algebraic symbolism, it was not used for 
1250 years after Diophantus invented it?” 

The dominance of Euclidian geometrical algebra is typically considered 
as the major reason that Diophantus’ invention was not used or further 
developed for a millennium [Parshall, 1988]. 

Was this accidental? Close consideration shows that it was not acciden- 
tal in a way similar to the fact that Ptolemy’s geocentric model was not acci- 
dental. Geometric algebra is more directly visual; that is, it is similar to di- 
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reel modeling of physical entities that we pereeive directly, such as lines, 
angles and the rotating of the Sun around the Earth. In contrast in Diophan- 
tus’ algebra, we are forced to operate with abstract entities that we cannot 
observe directly. Diophantus’ algebra operates with visnal but abstract 
concepts of unknown values, constants and arithmetic operations. Geometric 
algebra operates visually with concrete objects of the real world. More ex- 
actly, it operates with objects that have a direct match in the real world. 
Thus, geometric algebra was much easier to understand. It relies much less 
on abstract thinking. 

Now we can return to the present and ask: “Why is there still no sophisti- 
cated visual reasoning system?” 

We feel that the reason is basically the same as it was in the time of 
Euclid, Pythagoras, Diophantus and Ptolemy - sophisticated visual reason- 
ing requires the visualization of abstract concepts not only objects that have 
direct match in the real world. Visual reasoning with abstract concepts is the 
real challenge that still needs to be addressed. 

1.4. Lessons from history of art 

In the previous section, we demonstrated advantages of symbolic, iconic 
representation for solving mathematical tasks. The history of art provides us 
even more fascinating examples. For centuries artists were able to express 
large textual pieces of the Bible in a single painting. If we consider small 
paintings and sketches that were made for viewing from the same distance as 
text we can notice that they occupy much less space than the corresponding 
Bible’s text. 

Thus, artists “compress” the space occupied by bible’s contents. Artists 
also “compressed” other texts such as myths and proverbs. The important 
advantage of “reading paintings” is that we can see the “whole story” at once 
using our parallel visual processing abilities. Thus pictures also compress 
the time required for “reading” a picture in comparison with a sequential 
reading of the text. 

Note that the text itself can be “compressed” into other text forms known 
as proverbs and sayings. Table 1 in Chapter 8 illustrates the case and permits 
the comparison of textual and visual “compressions.” It is based on Pieter 
Bruegel’s painting “Blue Cloak” that “compresses” 78 Flemish proverbs. 
Fragments of Bruegel’s painting serve as role models for icons for complex 
concepts. 

Text, which has been compressed into a text, based metaphor or proverb 
still needs to be read sequentially, but an image based metaphor or icon can 
be “read” as a whole. Above we illustrated a merging (summation) operation 
for two algebraic expressions in symbolic form, showing that the merged 
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expression occupies the same space as each of the original components. Ta- 
ble 1 in Chapter 8 illustrates how this is done in art. The text based metaphor 
is doubled practically, but the merged picture metaphor occupies almost the 
same space as each of its components, showing the same “compression” ef- 
ficiency as in the mathematical examples above. 

One can ask: “How is this example from art related to use of visual sym- 
bolism in solving decision-making problems that we have seen in mathe- 
matical examples?” At first glance, there is no such relation, but the effi- 
ciency of symbolic mathematics in large part came from its compressed in- 
formation representation. Thus, any efficient compressed visual presentation 
can be potentially insightful for the decision-making and problem-solving 
tasks. 

Not every relation in the world can be easily presented in text. Often spa- 
tial adjacency is a natural way to foster a vision of non- textual relations be- 
tween entities. Merging two proverbs visually reveals a deeper meaning of 
each proverb and their interrelation than a text only representation does. 



2. SOLVING ICONIC EQUATIONS AND LINEAR 

PROGRAMMING TASKS 



2.1. Iconic algebraic expressions 

Assume that we need to compute a simple sum, 5' = 2 + 5- l+ 4 = 10. 
Next assume that we have two associated data types (lions and birds): 

S = two lions + five birds — one lion + four birds. 

At first, there is some uncertainty about what this sum means. If we ignore 
the data types, we will get 10 abstract entities or living creatures. 

But if we pay attention to specific data types, we will have to have two 
numbers in our solution: one (lion) and 9 (birds) that cannot be called a sum 
in a standard arithmetic. To be consistent with the standard arithmetic two 
separate equations: 2-1 = 1 (for lions) and 4+5 = 9 (for birds) can be gener- 
ated. Data with a hundred different data types will be expressed in a hundred 
equations in the standard arithmetic. Visual, iconic arithmetic allows us to do 
this in one equation. 

Such visual iconic arithmetic is shown in Figure 1. It can be viewed as a 
special type of data fusion, not just counting but a process of combining 
data. 
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Figure 1. Multi-sort visual/iconic arithmetic 



Similarly, human population dynamics, separated by gender, could be 
computed using male and female icons. One could further compute the total 
population by ignoring gender icons. 

The same fusion task without visual iconic representation would require 
introducing explicit textual data types of the entities, for example: creature, 
animal, human, male and female. In physics, we set up physical measure- 
ment scales such as kg and sec. Physics rales then prohibit us from comput- 
ing 2 km + 3 sec + 5 kg - 2 sec+ 16 kg as this summation involves different 
physical modalities and measurements. To be correct we have to compute 
separately sums for km, sec, and kg: (1) 2 km, (2) 3 sec - 2 sec, and (3) 5 kg 
+ 16 kg. On the other hand, computing 2 km+ 3300 m + 500 m - 2 km + 160 
m can be done correctly by converting all components to the same units, say 
km, 2 km+ 3.3 km + 0.5 km - 2 km + 0.160 km = 3.96 km. 

Iconic language syntax permits us having a single equation for lions and 
birds by adding shading. Shading could then be interpreted as ignoring ani- 
mal types. Obviously, it makes sense only if a meaningful supercategory 
such as animal or creature exists. The ontology of the domain provides us 
with a natural way of testing for the existence of a supercategory before try- 
ing to compute a result. Technically this ontological approach is imple- 
mented like the idea of semantic zooming that is described in detail in Chap- 
ter 10 (Section 2.2). 

Semantic zooming in Figure 1 can convert both lions and birds to a gen- 
eral “creature” category as shown in Figure 2. 




=10 creatures =10 




Figure 2. Iconic arithmetic with shaded data types 
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Having a special icon for this supercategory we can transform computa- 
tion to a more standard form as shown in Figure 3. 



' ^ 

5 c. 





Figure. J. Iconic arithmetic for a supercategry - creature 



Figures 4 and 5 illustrate the same equations with different creatures and 
corresponding icons. Both examples are based on Egyptian hieroglyphs. 

2.fct.+5'^-l.^+4^ = 2.^ -1.^+5^ +4^ = 
(2-1)^+ (5+4)^ 

Figure 4. Equations with hieroglyphic icons 



2^+5^-l^+4#h = 2^'-lA'+5^+4#h = 
(2-1)^ +(5+4)^=li'+ 9^. 

Figure 5. Equations with hieroglyphic icons 



2.2. Visual multi-sort iconic equations and their solutions 

Consider an iconic multi-sort equation with unknown x: 

5x^+ 8x^ = 12y^ 

What does it mean? We can interpret this equation as: 
and construct two equations from it - one for lions and other for birds: 

5x^ = 12y^ 







5x^ + 8xw = \2y 
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where icons could indicate a data type similar to our use of kg, sec, m, and $. 
For instance lion icon can be lion’s weight, price or size of habitat. 

In general, icons can be interpreted not only as data type modality 
indicators but also as indicators of measurement units similar to our use of 
sec, min, hour, day, week, month, and year. 

In addition, we can use icons as indicators of additional variables with a 
specific modality and measurement units. For instance, the lion icon can be 
interpreted as amount of food consumed by a lion per day in kg. Similarly, 
the bird icon can indicate the same of bird. Obviously food for lions and 
birds is quite different even if its amount is expressed in common units such 
as kilograms. Ignoring these differences for many situations and tasks can 
be incorrect. At this stage we deliberately do not specify any of these inter- 
pretations for lion and bird icons in equations above. We try to use a purely 
syntactic manipulation with icons as long as possible and go to a specific 
interpretation only when further syntactic manipulation cannot continue 
without an interpretation. 

The first equation with lions has many solutions. Namely, every pair {x, y) 
for which x = (12/5) y is a solution. The second equation also has many solu- 
tions which take the form (0,y). FI ere x andy could be the number of lions 
and birds, respectively. 

This method is based on reduction of the multi-sort equation into two sin- 
gle-sort equations. An alternative approach would be to simplify the multi- 
sort equation before interpreting it by combining all lions and birds together: 



If now one choose a particular case x = y = 1 , we can write down this 
case as 



Flow the last equation can be interpreted? It might represent a relation be- 
tween prices. Consider for example, the case of an exotic bird and a wild 
lion. The interpretation could be that the price of the bird is 7/8 of price of 
the lion. Another interpretation could involve habitats. The size of bird’s 
habitat could be 7/8 of the size of lion’s habitat. In this case, lion and bird 
icons are interpreted as additional variables: variable a is price of a lion and 
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variable b is price of a bird. Thus the last equation can be rewritten in a stan- 
dard way: 



The iconic presentation is convenient because we do not need to carry 
interpretations into the iconic equations. We can operate with iconic equa- 
tions syntactically just like classical algebraic equations. The interpretation 
can then be external as the semantic meaning of syntactical operations in 
iconic algebra. This follows the standard algebraic practice where abstract 
equations such as 3y = 15 and y = 1 5/3 = 5 are used instead of interpretation 
specific equations 3y kg = 15 kg or 3y miles =15 miles. 

Currently, with an increasing amount of heterogeneous, multimodal 
data coming from a huge variety of different sources multi-sort iconic 
arithmetic can be helpful in data fusion and integration. Typically, at the 
beginning, the task is vague and we often do not know exactly what we want 
to do. For instance, we might want to operate separately with specific enti- 
ties (e.g., birds and lions, or the female population) or with more general 
categores of entities (animals or humans). We might be uncertain about the 
level of ganulatiry that we need to carry. This is often because our goal is not 
yet clearly defined and our ability to reach such a goal is also uncertain. 

Assume that we want to plant some crops in an area with a known total 
available size. We are uncertain about the planting plan. It could be a very 
detaied plan involving specific individual varieties of wheat, grass and cot- 
ton or much more generalized plan for the three categries: wheat, grass or 
cotton. In the latter case, the model would be much simpler and could use 
average costs of planting and other prices. The information available may 
not be sufficient for a more specific planning model but this may not be clear 
in advance. For instance, information may exist for individual crop varieties 
that turns out to be unreliable. 

Syntactic manipulation with icons permits us: (1) postponing the exact 
task specification, (2) providing great flexibility for task specification and 
(3) avoid being stuck with some task specification prematurely. 

The example below illustrates the use of iconic equations and syntactic 
manipulation without explicit interpretation of lion and bird icons and vari- 
ables X and V in advance. 





( 1 ) 



( 2 ) 
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We can solve these two equations using an iconic version of the classical 
Gaussian elimination method. This iconic generalization can be done in a 
matrix form. Multiplying Equation (1) by 5 and Equation (2) by 2 we will 
get: 




+ 15vw=20 






18v 




The subtracting the second equation above from the first one eliminates x. 





- 18v 




After simplification this is equivalent to 






V = (20/25) 




Now we need to interpret components of equations (1), (2) and the equation 
derived from them including the “ratio” of icons in the last equation. In both 
equations let v be the number of lions and v be the number of birds. Lion and 
bird icons are interpreted as additional variables a and b, that stand for prices 
of a lion and a bird. 



Thus, 




is a ratio R between prices for a lion and a bird. If = 



25, then the number of birds v is (20/25)25 = 20. In this interpretation, equa- 
tion (1) means that the total price of 2x lions plus 3v birds is the same as the 
price of 4 lions. Equation (2) has a similar interpretation. In classical terms 
equations (1) and (2) are: 



2xa + 3yy = Aa 



5xa + Avb = 9vb. 
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This means that exactly the same Gaussian elimination can be done in clas- 
sical terms. The idea of iconic equations is that a human when formulating 
the problem to be solved can use icons to assist in problem formulation and 
initial reasoning. Then iconic equations can be converted to a traditional sys- 
tems of equations and solved analytically. If we generalize birds and lions as 
creatures from the beginning then R = \ and a = b. 




This will produce another solution of the equations (1) and (2) that can be 
produced analytically in a regular algebraic way. 



2.3. Systems of iconic equations and iconic linear 

programming 



Further development of this approach permits us to write and solve sys- 
tems of iconic linear equations and perform linear programming optimiza- 
tion tasks. Such optimization tasks can represent highly formalized decision- 
making tasks that are typically outside the capabilities of iconic visualization 
and the visual decision-making process. 

Figure 6 illustrates this iconic problem solving approach with a goal of 
maximizing a linear objective function that includes lions and birds and sat- 
isfying two constrains given as inequalities, one for lions and another for 
birds. 



Max(3x 





+ 5y^ - Iv 




+ Aw 







+ 



Aw 






+ 





10 



Figure 6. Multi-sort iconic linear programming task 

The exercise section below contains several tasks that are open problems 
in iconic equations and iconic optimization, such as non-linear, discrete and 
stochastic iconic optimization problems. 
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3. CONCLUSION 

This chapter has discussed the algebraic visual symbolism for problem 
solving and lessons learned from the history of algebraic equations from 
Diophantus to the present. It was shown that often the textual form of an 
equation is ambiguous and much more verbose than either modem or Dio- 
phantus’ notations. These lessons from history led us to new concepts for 
iconic equations and iconic linear programming tasks. 

We discussed several questions. How it was possible that, in spite of ob- 
vious advantages of algebraic symbolism, algebraic notation was not used 
for 1250 years after Diophantus invented it? Was this accidental? The 
conclusion was that this was not accidental. It was a result of the dominance 
of Euclidian geometric algebra and the fact that Diophantus’ algebra 
operates with abstract entities that are not directly observable. Diophantus’ 
algebra operates with visual but abstract concepts of unknown values, con- 
stants and arithmetic operations. Geometric algebra operates visually with 
concrete objects that have a direct match in the real world. Thus, geometric 
algebra was much easier to understand. It relies much less on abstract 
thinking. Geometry is closer to direct modeling of physical entities. 

It was shown that the history of art provides valuable lessons in the same 
vane as the history of algebraic equations. Artists “compress” the space oc- 
cupied by texts such as Bible, myths and proverbs. The important advantage 
of “reading paintings” vs. reading text is that we can see the “whole story” at 
once using our parallel visual processing. Thus, pictures compress the time 
required to “read” a concept in comparison with sequentially reading text. 
Multi-sort iconic equation representation is convenient because we do not 
need to carry interpretations into the iconic equations. We can operate with 
iconic equations syntactically similar to what is done in classical algebraic 
equations. Interpretation can be external as the semantic meaning of syntac- 
tical operations in iconic algebra. We also noted that with an increasing 
amount of heterogeneous, multimodal data coming from a huge variety of 
different sources multi-sort iconic arithmetic can be helpful in data fusion 
and integration. For decision-making tasks, iconic equations are important 
because they permit one to work with the high level of uncertainty that is 
natural for the initial stages of decision making and problem solving. 



4. EXERCISES AND PROBLEMS 



1. Try to solve a system of two equations x + y = 5 and 3x - y = 2 in text. 
Record and compare your time for this solution with the algebraic sym- 
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bolic solution. Compare the amount space used to produce both solu- 
tions. 

2. Design an iconic version of the Gaussian elimination method for solving 
systems of linear equations with three variables. 

Advanced 

3. Design an iconic version of the Gaussian elimination method for solving 
systems of linear equations with n variables in a matrix form. 

4. Design an iconic version of the simplex method for solving the linear 
programming task with n variables. 

5. Design an iconic version of a non-linear optimization problem with n 
variables. 

6. Design an iconic version of a stochastic optimization problem with n 
variables. 

7. Design an iconic version of a discrete optimization problem with n vari- 
ables. 



5. REFERENCES 

Al-Khwarizmi, M. Al-jabr wa'1-muqabala, [English translation: Al-Khwarizmi, 1974. 

Altshiller-Court, N. The Dawn of Demonstrative Geometry. Mathematics Teacher, 57, 1964, 
163-166. 

Cohen, M., Drabkin, I. Source Book in Greek Science. Cambridge, MA: Harvard University 
Press, 1958. 

Geller, L. Start of Symbolism, 1998, http://www.und.nodak.edu/instruct/ Igeller/ 

Miller, J. Earliest Uses of Symbols for Variables, 
http://members.aol.com/jeff570/variables.html, 2001 

O'Connor, J. Robertson, F. An overview of the history of mathematics 1997, http://www- 
gap. dcs. st-and.ac.uk/~history /HistTopics/History_overview.html#l 7. 

O'Connor, J. Robertson, F. Diophantus of Alexandria, 1999, http://www- 
groups.dcs.stand.ac.uk/~history/Mathematicians/Diophantus.html 

Parshall, K The art of algebra from Al-Khwarizmi to Viete: a study in the natural selection of 
ideas. History of Science, Vol. 26, No. 72, 1988, pp. 129-164. 

Parshall, Biography of Diophantus, 2002. 

http://www.lib.virginia.edu/science/parshall/diophant.html 

Schroeder M. Number Theory in Science and Communication: with Applications in Cryptog- 
raphy, Physics, Digital Information, Computing, and Self-Similarity (3rd Ed)), Springer- 
Verlag, 1997. 

Swift, J. Diophantus of Alexandria. American Mathematical Monthly, 63 (1956), 163—70. 



Chapter 6 

ICONIC REASONING ARCHITECTURE FOR 
ANALYSIS AND DECISION MAKING 
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Abstract: This chapter describes an iconic reasoning architecture for analysis and deci- 

sion-making along with a storytelling iconic reasoning approach. The ap- 
proach includes providing visuals for task identification, evidences, reasoning 
rules, links of evidences with pre-hypotheses, evaluation of hypotheses. The 
iconic storytelling approach is consistent hierarchical reasoning that includes a 
variety of rules such as visual search-reasoning rules that are tools for finding 
confinning links. The chapter also provides a review of related work on iconic 
systems. The review discusses concepts and terminology, controversy in 
iconic language design, links between iconic reasoning and iconic languages 
and requirements for an efficient iconic system. 

Key words: iconic reasoning architecture, analysis and decision-making, storytelling iconic 

approach, iconic language. 



1. INTRODUCTION 

The goal of an iconic evidentiary reasoning (lER) is to convey complex 
multi-step analytical reasoning and decision making in a more efficient and 
condensed way than traditional text reasoning. lER is defined as a visual 
support mechanism for the following general problem-solving steps: 

• defining the problem, 

• generating initial alternatives for hypotheses that are called pre- 
hypotheses, 

• linking pre-hypotheses with evidences, and 

• generating hypotheses by evaluating pre-hypotheses against evi- 
dences. 
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Typically these steps are repeated several times in the process of refining 
hypotheses and evidences in scientific research as well as in intelligence 
analysis, engineering and architectural design, market analysis, health care, 
and many other areas. 

To explain the lER approach we use a modified task of ongoing monitor- 
ing of the political situation in some fictitious country. The final reasoning 
result condenses all of the most important visuals from analytical steps in a 
single compact picture. This picture combines several types of icons and ar- 
rows that indicate a conclusion, its status and a chain of evidences that sup- 
port the conclusion. For presentation purposes the final part of the reasoning 
chain can be enlarged and the icons replaced by actual maps, imagery and 
photographs of people involved. 

Major components of the lER architecture are: 

• Collecting and annotating analytical reports as inputs using a 
markup language, e.g., XML, DAME; 

• Providing iconic representation for hypotheses, evidences, sce- 
narios, implications, assessments, and interpretations involved in 
the analytical process; 

• Providing iconic representation for confirmations, and beliefs 
categorized by levels; 

• Providing iconic representation for evidentiary reasoning 
mechanisms (propositional, first-order logic, modal logic, prob- 
abilistic and fuzzy logics); 

• Providing scenario-based visualization and visual discovery of 
changing patterns and relationships: 

• Providing a condensed version of iconic representation of 
evidentiary reasoning mechanisms for presentation and 

These components are depicted in Figure 1. The use of iconic visuals per- 
mits a user to reach a high condensing ratio level. Experiments reported in 
Chapter 10 show that iconic sentences can occupy space that is 10 times 
smaller than space occupied by text, that is a compression factor of 10 is 
possible. Also people can work with multidimensional icons two times 
faster than with text [Spence, 2001]. A similar time and space compression 
is expected to communicate analytical results (including underlying reason- 
ing) to decision-makers and fellow analysts using lER. Moreover at some 
moment with such advantages and the ongoing proliferation of visualization 
technology, iconic reasoning can become a major way of reasoning and 
commnnication in general. The visual correlation approach described in 
chapters 8-10 can be naturally combined with visual reasoning to improve 
problem solving. 
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Figure 1. Iconic evidential reasoning architecture 



2. STORYTELLING ICONIC REASONING 

ARCHITECTURE 



2.1 Task identification: output characteristics and pre- 
hypotheses 

In this section we discuss how the process of defining the problem and 
generating pre-hypotheses can be done visually. The problem is defined as 
ongoing monitoring of a political situation in some fictitious country [AQ- 
UANT, 2002]. This may include identification of the country, and selection 
of processes to be monitored such demographic, economic, democratic proc- 
esses, research & development and military activity. Assume that the user 
selected a task of monitoring democratic processes and identified overall 
output characteristics to be evaluated as one of the judgments: positive 



132 



Chapter 6 



change, no significant change, negative change, mixed change at this mo- 
ment. In lER task identification is also done using an iconic user interface 
where the user picks up the task from the iconic menu of tasks and character- 
istics as shown in Table 1. 

Table 1. Task identification 



Alternative tasks Select icon Selected icon 



Monitoring democratic processes 




0 


Monitoring economic processes 




□ 


Monitoring military activity 




□ 


Monitoring research & development processes 




□ 


Monitoring demographic processes 




□ 



Next the user selects output characteristics to be monitored from the menu 
provided in Table 2. 



Table 2. Defining the output characteristics 



Characteristics 



Select icon 



Selected icon 



Description of a new process 




□ 



Change direction (positive, no change, negative, mixed) 






0 




Rate of change (low, medium, high) 




% 






□ 





Description of an emerging leader 



□ 



After that the user picks up a pre-hypothesis from the menu for the se- 
lected task. The menu contains all logically possible alternatives for changes 
in country Y in the selected scale shown in Table 3. 
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Table 3 . Pre-hypotheses and their legends. See also color plates. 



Pre-hypotheses Legend 1 Legend 2 Selected 

icon 


Hi Change is positive in country Y at this 
time (green shapes) 








0 




H2 No significant change in country Y at this 
time (gray shapes) 








□ 




H3 change is negative in country Y at this 
time (orange shapes) 


nin 






□ 




H4 change is mixed in country Y at this time 
(orange/green shapes) 


c 




t 


□ 





Icons in Table 3 show alternative legends for pre-hypotheses. For in- 
stance, mixed change alternative (H4) is presented visually in three ways: as 
a rectangular with orange and green components (green indicates a positive 
change, and orange indicates a negative change. A two-color flag and two 
flags express the same idea a little bit differently. Analysts have an option to 
select an icon legend that best fits her/his preferences and perceptual abili- 
ties. 

2.2 Visual evidences 

In this chapter, we assume that visual evidences are already collected. 
They also can be already encoded in a predicate form or in XML form that 
can make automated iconization of them easier. Table 4 presents examples 
of evidences. 

Evidence En states that a new person that supports democracy is in 
power. This evidence is depicted by the icon of an official with a positive, 
green background. A speaker with a neutral, gray background is an icon for 
evidence E^ that there are no new indications of suppressing free speech. 
Altering the color in the icon for the first evidence produces an icon for 
evidence E 21 , that there are no indications that new people with alternative 
views are in power. 

Further alternation of the color in the first icon along with two dots pro- 
duces an icon for evidence E 31 where orange background is interpreted as 
negative with icon meaning that several new persons that oppose democracy 
are in power. 
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Table 4. Evidence and iconic representations See also color plates. 



Evidence Icon description Icon 


A new person that supports An official with a/)o^///ve, g-reew back- 

democracy is in power ground 






No new indications of A speaker with a ueutra/, gray back- 

suppressing free speech ground 


1 


] 


E 21 No indication that new people An official with a neutral, gray back- 

with alternative views are in ground 

power 




] 


j Several new persons that op- An official with negative, orange back- 

pose democracy are in power ground for opposition to democracy 

and two dots for “several” officials 




! 


New indications of suppression Orange background encodes negative 

of free speech fact- suppression of free speech 


I 


] 


a new persons that oppose An official with negative, orange back- 

democracy is in power ground for opposition to democracy 


s 


1 


New indications of free speech Green background encodes positive 

fact- free speech indications 


rj 


■ 



The color language and icon content used for icons in Table 4 is de- 
scribed in Table 5. This language is easy to learn. It had only two iconic 
elements (iconels) on content (a speaker and an official), three colors and 
presence and absence of dots for quantity providing total 3*2*2=12 icons. 



Table 5. Evidence encoding legend 



Icon element (iconel) 


Semantic indicator 


Green background 


positive change, positive statement 


Gray background 


neutral change, positive statement 


Orange background 


negative change, positive statement 


Person sitting at the desk 


Official in power 


Person giving a talk 


Speech 


Two dots 


Several people 



2.3 Visual reasoning rules 

Evidences provided in section 2.2 can be combined in if-then reasoning 
rules such as shown in Table 6 in both a natural language and in a formal 
logic. The first rule “If a new person that supports democracy is in power 
(Ell) and no new indications of suppressing free speech (E^) then positive 
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change in country Y (Hi) is possible (with some confidence)” has its formal 
equivalent, Eu& E12 ^possible Hi. 



Table 6 . Reasoning rules, templates 





Natural language rules for the hypotheses 


Rules 


Ri 


If a new person that supports democracy is in power 
(E,i) and no new indications of suppressing free 
speech (E12) then positive change in country Y (Eli) 
is possible (with some confidence). 






IF Evidences En and Betake place then Eli (positive 
change) is possible 


E11& E ]^2 ^'possible Hi 


R2 


IF Evidences E^ and E21 take place then EI2 (no sig- 
nificant change) is highly probable 


Ei 2 & E21 ^’highly probable H2 


R3 


IF Evidences E31 and E32 take place then EI3 (nega- 
tive change) is true 


E31 & E 32 ^’true H3 


R4 


IF Evidences En and E32 take place then EI4 (mixed 
change) is true 


Ell & E 32 ^’possible H4 



These rules can be visualized in different visual forms. Figure 2 shows 
visualization of rule Ri in two graphical forms. The first line shows the rule 
in an abstract block diagram paradigm. The significant difference from 
the classical formal logic here is in the use of the colors to indicate positive 
(green) meaning of the terms En and H12 and neutral (gray) meaning of the 
term E12. The second line presents the iconic storytelling paradigm where 
terms En and E12 are presented as icons with the same green and gray back- 
grounds. In both forms an arrow indicates inference, where P stands for pos- 
sible (modal logic operator). The iconic form reveals more information than 
abstract one and it is more appealing perceptually. 




Figure 2 . Traditional and iconic visualizations of rule Rp See also color plates. 

Figure 3 shows inferences for other rules R2, R3, and R4 from Table 6 . The 
middle line reveals firm reasons (black arrow) for an orange flag — "several 
new persons that oppose democracy are in power" and there are "new indica- 
tions of suppressing free speech”. Thus, a black arrow indicates sure conclu- 
sion (true statement). Thus, orange line of reasoning in the middle row im- 
mediately and preemptively indicates a negative line of events. Similarly a 
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gray line indicates a neutral line of reasoning and conclusion. A mixed color 
line indicates a mixed conclusion. Similarly, to Figure 2 this figure illustrates 
that the iconic form conveys more information, is more appealing and per- 
mits to convey a reasoning statement easier. 




Figure 3. Traditional and iconic visualizations of rules. See also color plates. 



Table 7 presents some arrow icons used in lER. Arrow icons have a hier- 
archy, that is if we want only to encode that fact that the result is possible we 
can use the first icon in Table 7, but if we want to encode the possibility 
more specifically we can use text markers such as FIP and LP for highly pos- 
sible and low level of possibility. Another option is use of partially filled 
arrows to identify the level of conclusion certainty as shown in Table 7. 



Table 7. lER selected arrow icons 
Icon Interpretation 

P - Possible conclusion 
HP - Highly possible conclusion 





Sure conclusion (classical logic true) 



Conclusion with 50:50 chances 




A search which may yield nothing, a correct 
result or an incorrect result 




Conclusion based on the use of a Device 




Conclusion based on a Human judgment 
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2.4 Linking evidence and pre-hypotheses using 
reasoning rules 

Next we link available evidences and a pre-hypothesis using reasoning 
rules as matching templates. This is a first step for evaluating pre-hypotheses 
and finding plausible hypotheses. If a pre-hypothesis Hi is selected and evi- 
dences listed in table 4 are available, then we look through rules listed in 
Table 6 to see if some of them match hypothesis Hi with at least some of 
available evidences. Rule Ri is such a rule, since it matches Hi with En and 
E12: 

Ell & Ei 2 —^possible Hi. 

To find matching rule Ri we can compare texts in Tables 3 and 4 or 
icons in Figure 2, Tables 3 and 4. Comparing icons has some advantages 
because they contain more information than symbols En and E^ and repre- 
sent their meaning more directly. 

2.5 Pre-hypotheses evaluation against evidence using 
visual rules 

Any logically possible alternative is a pre-hypothesis but only some of 
them are meaningful hypotheses (or hypothesis for short). We assume that 
meaningful hypotheses are those pre-hypotheses that (i) have some confirma- 
tion by evidences and opinions or (ii) it is expected that such confirmation 
by evidences or opinions can be found. Thus, we differentiate pre- 
hypotheses and hypotheses conceptually. 

In the example above, rule Ri matched pre-hypothesis Hi and available 
evidences. This match happened to be complete, that is every term in the if- 
part of the rule Ri has been found in the database (Table 4). This creates the 
base for evaluation of the pre-hypothesis Hi as “possible” according to the 
description of rule Ri presented above. It is easy to see that here we followed 
a rule-based approach typical in knowledge-based reasoning systems. The 
significant difference is that we can accomplish this reasoning by iconic 
means by human and automatic iconic search. 



3. HIERARCHICAL ICONIC REASONING 

In the previous consideration we assumed that evidences listed in Table 4 
are already given, but in fact they often need to be established. Let us con- 
sider evidence En = “a new person that supports democracy is in power” as 
a candidate evidence. Actually, this statement is a new hypothesis Hu of 
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lower level 2. To confirm or refute it we introduce and visually represent 
using icons two new evidences Em and En2 shown in Table 8. These evi- 
dences are of the next level 2. Reasoning rules for evidences of this level are 
shown in Table 9. Rule R^ is a conjunction of two rules Rn and R12. 

Table 8. Hypotheses and their legends. See also color plates. 



Evidences (level 2 hypotheses) Icon 



Ej j j a new pro-democracy person X stands next to the president in a 

recent photograph (partial green background encodes positive 
change) 


'tt 




E[ 12 pro-democracy person X is appointed to the cabinet as the 

national security advisor {green background and arrow up encode 
positive change and high rise) 







Table 9. Reasoning rules, templates 





Natural language rules for the hypotheses Hi 


Rule 


Rii 


A new pro-democracy person X stands next to the president in a 
recent official photograph 

If Em then Hu 


Em— >Hii 


Ri 2 


A new pro-democracy person X is appointed to the cabinet as 
the national security advisor 

If E112 then Hii 


E112-4H11 


Rb 


A new pro-democracy person X stands next to the president in a 
recent official photograph and a new pro-democracy person X is 
appointed to the cabinet as the national security advisor 


Em &E112 




If Em & E112 then Hu 





A new level 2 reasoning rule Rb (If Em & Ei^then Hu) is shown visu- 
ally in Figure 4. Now keeping in mind that Hii=En we can visually combine 
reasoning steps from Figures 2 and 4 to produce a reasoning chain (see Fig- 
ure 5). 




Figure 4. Comparison of two visual reasoning alternatives. See also color plates. 
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Figure 5. Reasoning chains. See also color plates. 



In this figure the first line shows the match found between Hu and En in 
rules visualized by partial overlapping the blocks for Hu and En. Similarly, 
the second line matches icons for Hu and En by overlapping them. The third 
line shows a completed match with a full overlapping of matched Hu and 
Ell icons. A user can do this visually by dragging one icon over another one 
and animate the process and the result. 

Dragging is an additional intuitive element of visual reasoning. It is also 
possible in abstract reasoning (first line in Figure 5), but it requires remem- 
bering that Fill and En are the same. In contrast icons reveal similarity of 
these concepts instantly. Figure 5 makes the reasoning chain evident and 
easy to communicate. The first step of reasoning is firm (black arrow), but 
the second one is only possible (arrow with letter P). Flaving a longer 
reasoning chain or a tree an analyst and a decision maker can quickly see the 
most questionable reasoning steps that may need more attention. 



4. CONSISTENT COMBINED ICONIC REASONING 



In this section we elaborate the process of combining iconic rules in more 
detail. As we can see, iconic rules combine evidences and hypotheses uni- 
formly. We start from visualizing reasoning that combines two visual rules 
to produce a new rule as shown in line 1 in Figure 6 . 

The first rule is “If a new person that supports democracy is in power 
(Ell) then positive change in country Y (FIi) is possible". The second rule is 
“If there are new indications of free of speech (E 42 ) then positive change in 
country Y (Hi) is possible". The produced rule on the right is “If a new per- 
son that supports democracy is in power (En) and there are new indications 
of free of speech (E42) then positive change in country Y (Hi) is possible". 
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This is a rule Ri from Table 6 . Such conjunction is generic for combing any 
rules and we will call it a conjunction metarule. 

We can also generate another compact visual reasoning rule shown in line 
2 in Figure 6 . This visual rule shows that if a mixture of “gray” and “green” 
properties implies a positive change then a consistent analyst should accept 
that two “greens” also should imply a positive change. This rule is based on 
principle of monotonicity. The storytelling visual rule is much shorter and 
intuitively clearer than text of this rule: 

IF (If a new person that supports democracy is in power (Eu) and no new 
indications of suppressing free speech (E 42 ) then positive change in coun- 
try Y is possible) 



Then (If a new person that supports democracy is in power (Eu) and 
there are new indications of free speech (E 12 ) then positive change in 
country Y is possible) 




If 






If 



m 



Then 



n 






Figure 6. Visual reasoning rules. See also color plates. 



The importance of this monotonicity rule is in the fact that we do not need to 
write this rule in advance. We can generate a specific form of this rule auto- 
matically using the principle of monotonicity. This metarule (rule applied to 
rules) will be called monotonicity metarnle. 

In the same way another short visual rule is generated in line 3 of Figure 
6 . It can represent analyst’s opinion: "If a positive change is possible be- 
cause of a pro democracy person is in power then positive change is possible 
even if there is no progress in free speech". A visual presentation of this 
statement reveals its structure clearly and is shorter. This rule also has its 
metarule counterpart - neutral metarule -adding a neutral statement (with 
& operator) does not change reasoning result. 
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4.1 Visual search-reasoning rules 



The process of searching for other candidate evidences can bring us to the 
lower level hypotheses similarly to the discussion above. In this process, 
candidate evidences are considered as pre-hypotheses and new evidence 
candidates for them are generated and visualized on the next third level. We 
are making search an explicit part of the reasoning process by introducing 
search rules such as rule Rm shown in line 2, Figure 7: 

If the name of X is known then search in the list of foreign chiefs for this 
name and if found retrieve the post occupied. 



Thus, this approach visualizes integration of declarative and operational 
knowledge, where search rules represent an operational knowledge. Let us 
assume that search produced the following result — Mr. X is a national secu- 
rity advisor in country Y. The line of reasoning that produced this result can 
be expressed visually by rule Rm shown in the second line in Figure 7. 

The textured arrow indicates that the search result can be incorrect or not 
guaranteed. For instance, the name may not be in the search list or the list 
contains the name, but it is another person with the same name. 

Let us assume that (1) we have a candidate evidence Em ="A new pro- 
democracy person X stands next to the president in the recent official pic- 
ture"; (2) the analyst has found a photograph with a new person that stands 
next to the president during a recent visit to a foreign country, and (3) the 
analyst does not know who is Mr. X. 

If the name is not known we need a reasoning rule of the next forth level. 
For instance we may have rule Run: 

If name is not known then run face recognition software (FRS) of the se- 
lected face against all annotated images available from country Y. 



Figure 7 provides visuals for this reasoning by depicting rules Rmi and R 
used sequentially. 



Face recognition 
software, image 
base 




Name 





Post occupied 
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Figure 7. Search rules 
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4.2 Search for confirming links 

Now we need to check that Mr. X is pro democracy as stated in evidence 
Em, his name should be in Dem.Name file from an independent source. This 
request is presented as a visual rule in the line 1 in Figure 8. We may also 
have a negative rule Rin 2 depicted in line in Figure 8: 

If searching for political opinion of an official in non-democratic country 
Y then do not rely on local official media, search independent data. 

The crossed search arrow in line 2 indicates the negative rule, not to search 
using local media. 




Dem.Name 



Dem.Name 




Link database; 
link search 
software 




Dem.Name 



Figure 8. Search rules with negation 

Now if the independent database is not available we apply another level 4 
rule Rimi 



If searching for political opinion of an official in non-democratic country 
Y then search for confirming links of Mr. X. 



This rule is depicted in line 3 in Figure 8 as a green "link" block that requires 
running interactive link analysis software. The search is indicated by the 
search arrow. Link analysis software found a telephone call from country Y 
to Mr. X's home in 1998. A caller (Mr. W) confirms that Mr. X is pro de- 
mocracy ("yes" callout in visual representation). This mle is shown in Figure 
9, where letter “H” in the arrow indicates a confirmation from a human. 



Trusted 

source? 





Link search, 

search 

analysis 



Figure 9. Visual source quality rule 
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Mr. W is a trusted source (green block "trust"). To make this conclusion 
we use a lower level rule such as rule Rumi: 

If Mr. W was tested using a polygraph successfully before then X can be 
trusted. 

Figure 10 shows this as a visual reasoning rule, where “D” in the arrow 
means “confirmed" by a device. Thus, Figures 8-10 visualize logic of search. 




Trusted 

source 



Figure 10. Device-based source quality rule 



4.3 Integrating visual reasoning components 



Now successful reasoning steps can be combined into a single visual evi- 
dentiary reasoning scheme (Figure 11). This single picture shows all eight 
reasoning steps, levels of fidelity of conclusions, and points of linking of 
individual reasoning steps. For instance, it shows that only three steps out of 
eight steps are firm in final conclusion about positive change in country Y. 










-I- 



Figure 11. Integrated visual evidentiary reasoning scheme. See also color plates. 



Three conclusions have no guarantee in conclusion (they came from search). 
One conclusion came from a device and one came from a human. 

This picture can also be augmented with more elaborated levels of confi- 
dence of conclusions, their contradiction and quality of the source. 

The example shows tight integration of three stages: Storytelling Icono- 
graphic Visualization, Collecting Information, and Evaluating Flypotheses. 
An analyst is able to see from visualizations similar to shown in Figure 11a 
current status of the analysis, which can show that only few hypotheses have 
been tested and tested against only a small number of evidences included in 
reasoning rules. An appropriate part of this visualization can be converted 
into a visual report to decision makers. 
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More complex hypotheses require more complex and dynamic icon de- 
velopment. Ideally iconic representation should be automated. This subject is 
discussed in Chapter 10. 

4.4 Visual reasoning for handling signal uncertainty 

Similarly visual evidentiary reasoning can be applied to improve handling 
signal uncertainty in identifying location of a signal source using visualiza- 
tion combined with probabilistic and fuzzy logics. A sketch on Figure 12 
illustrates a type of visual reasoning and presentation that is applicable here. 

A traditional approach uses ellipses to convey uncertainty of location 
based on radar information [Mikulin, Elsaesser, 1995] with simple ellipses. 
A more elaborated visual representation provides additional information: 
probabilistic distribution of individual fixes (see Figure 13). 

A visual reasoning technique conveys additional information: probabilis- 
tic distribution of individual fixes and their mixtures. Areas with higher val- 
ues of a distribution function have more points rendered. 




Figure 12. Signal ellipses. See also color plates. 



Figure 13 shows other alternatives to visualize uncertainty of location using 
iconic approach. It permits to convey visually, in a condensed way and 
quickly the differences between alternative combinations of fixes, where 
lines 2 and 3 visualize the traditional ellipsoid approach. 

This visualization also permits naturally convey a mixture of distributions 
as shown on the line 4 in Figure 13. Ellipses from Figure 12 also can be used 
in such visual reasoning. 

Similarly this approach can accommodate in visual reasoning new devel- 
opments in reasoning about radar emitters such as “what-and-where” fusion 
for recognition and tracking of multiple radar emitters based on a neural 
network learning technique [Granger, Rubin & Grossberg, 2001]. 
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Figure 13. Visual rules with signal ellipses. See also color plates. 



5. RELATED WORK 



5.1 Concepts and terminology 

Data representations are characterized by several dimensions [Sloman, 
1995]. Two dimensions are important in the context of visual reasoning: for- 
mal-natural and verbal-visual [Nadin, Kiipper, 2003]. Frixione et al. [1997] 
discuss the role of image-like representations in the computational 
modeling of mental processes in Artificial Intelligence and Cognitive Sci- 
ences. Authors trace earlier research starting from the theorem proving ma- 
chine [Gelerntner, 1959], the use of diagrammatic representations in prob- 
lem solving proposed by Funt [1980], accounts of mental imagery in terms 
of pictorial representations [Kosslyn, 1980], and “opposition” between im- 
age-like representations and more linguistically oriented approaches [Block, 
1981]. Later revival studies moved to blending features of images, dia- 
grams, and propositional systems, e.g., [Chandrasekaran, Simon, 1992; 
Gardin, Meltzer, 1989; Kulpa, 1994]. Iconic reasoning is a term actively 
used now, e.g., [Frixione at al, 1997]. This subject is also presented in 
Chapter 3. Often tasks are very different and the term is interpreted differ- 
ently. Iconic commnnication is another umbrella term [King, 1999; Yaz- 
dani. Barker, 2000] with focus on developing iconic languages for human 
communication. From our viewpoint the major difference between iconic 
communication and iconic reasoning is not in the subject (icons) but in the 
application area: problem solving (e.g., robot navigation) for iconic reason- 
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ing and human conversation for iconic communication. Note the term 
“iconic communication” itself is more general than its scope in current re- 
search. Thus, we suggest using this term as an umbrella term for both areas. 
A variety of icons are developed for both purposes [Dreyfuss, 1972, 1984], 

The term an iconic language is another term with different meanings. 
Valiant at al. [1995, 1997] define an iconic language as a language with ab- 
sence of a specified syntax. In contrast, iconic programming languages are 
defined with a specified syntax. 

Pierce categorized the patterns of meaning in visual signs as iconic, sym- 
bolic and indexical as shown in Table 10 based on [Moriarty, 1995; Nadin, 
Kiipper, 2003]. 



Table 10. Peirce’s sign categories 



Sign 


Description 


Example 


iconic sign 


looks like, resembles what it represents 


picture of a dog; 
garbage can icon 


indexical 

sign 


a clue that links or connects things in 
nature; 

the marks left by an object 


smoke as a sign of fire; 
icicles as a sign of cold; 
fingerprint, wind arrow as marks 
left by an object 


symbol 


meaning is determined by convention 


the US flag; the Statue of Liberty; 
Roman and Hindi-Arabic numbers 


More terms based on [King, 1999; Valiant, 1997] are described in Table 11. 


Table 11. Sign terminology 




Sign 


Description 


Example 


natural sign 


a universally intelligible iconic or in- 
dexical sign 


a picture of a dog; 
icicles as a sign of cold 


abstract sign 


a symbol, conventional sign that have to 
be learned 


Hindi-Arabic numbers 


ideogram 


an icon that encodes an idea/concept 


Bliss alphabet [Bliss, 1965] 


figurative 

icon/picture 


a metaphorical icon /sign 


the Statue of Liberty 


semem 


a traditional text message made up of 
linguistic entities in a natural language 


any text in English 



This is not a complete set of the iconic terms used. For instance, semiom is 
defined as a message composed with icons which do not necessarily match 
up to linguistic entities. Icons that express predicates are called predicative 
icons. Single word icons can be expressed with a single word in a natural 
language and multiple word icons correspond to more than one word in a 
natural language 
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5.2 Controversy in iconic language design 

Blissymbolics [Bliss, 1965] is an example of a sign system that was 
driven by comprehensibility of signs; semantics of signs intends to capture a 
deep meaning of a concept depicted in the icon. A roman number system is 
another attempt to create a comprehensible sign system. Compare Roman 
numbers three (III) and eight (VIII) with the same Hindu-Arabic numbers 3 
and 8. The first two are more “natural” (III directly shows three units “I”) 
and last two (3 and 8) are abstract, conventional signs without a direct link 
between their appearance and contents. History made its choice for Hindu- 
Arabic numbers, in spite of the fact that they are not “natural”, but they have 
advantages indicated in [King, 1999]: 

• encode a complex concept in a simple sign (8 is simpler than VIII), 

• support arithmetic simpler than Roman numbers, 

• can be easily learned and distributed (compare 1999 and its Roman 
equivalent), 

• supported by billions of users around the Globe. 

Two related extreme claims are listed in [King, 1999]: (1) all sign systems 
and methods of representation are inherently arbitrary and (2) pursuing in- 
trinsic comprehensibility of a sign is a chimera. 

Cruickshank and Barfiel [2000] develop an approach that augments tex- 
tual communication with user-created icons. The authors argue that this 
approach can overcome difficulties of alternatives such as Bliss symbology 
intended to replace textual communication by using a fixed iconic vocabu- 
lary. 

“Bliss faltered and ultimately failed because people are generally not 

prepared to learn and communicate with a rigid new language, syntax and 

grammar; the required investment in time and understanding outweighs 

the potential benefits” [Cruickshank, Barfiel, 2000]. 

It is suggested that a modern symbol system must be able to grow. This is 
especially useful for attempts to create a universal iconic language that can 
augment text in many possible situations of human communications (e.g., e- 
mail). However, this ambitious goal is not necessarily the goal of every 
problem solving iconic communication. For instance, a music symbolic 
language, digital logic, military and traffic symbologies are sophisticated 
iconic languages but are not intended to be universal and hardly can be pro- 
duced quickly in the course of communication itself Such languages should 
be consistent and have relatively lower ambiguity in contrast with a language 
for an unrestricted domain. These examples also illustrate the difference 
between two research focuses: problem solving and conversation between 
people. 
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5.3 Iconic reasoning and languages 

Typically visual reasoning models belong to one of two categories: logic- 
based “analogical” models and hybrid models [Frixione et al., 1997]: 

• Logic-based “analogical” models use visual representations 
isomorphic to their logical models [Levesque 1988; Johnson- 
Laird 1983, Fauconnier 1985], and 

• Flybrid models combine logical rule-based modules with dia- 
grammatic representations [Forbus 1995; Myers, Konolige 1995, 
Barwise, Etchemendy, 1995]. 

Often these reasoning models have computational advantages over rea- 
soning models based solely on propositional representations without visual 
components. Sometimes these models also are more expressive. 

The mathematical base of visual reasoning comes from classical logic, 
probabilistic reasoning, fuzzy logic, possibilistic logic. Recently description 
logic (terminological logic) with fuzzy logic components was added [Lutz, 
2003; Straccia, 2001]. 

Modern approaches in iconic languages start from Isotype [Neurath, 
1978] and Semantography [Bliss, 1965] developed in 1920-1940s. A recent 
related bibliography is presented in [Camhy, Stubenrauch, 2003]. 

Below we comment on a few recent iconic languages. Computer Assisted 
Iconic Language System, CAILS [Champoux, 2001] produces “iconic mes- 
sage objects”. It deals with visual/spatial concept representation with spe- 
cific syntax. Visual references or “words” are classified in the following 
categories: Hands, Movements, Expression, and Pictures. CAILS's grammar 
contains six conjunctions: standard complementizer (that), implicative (if), 
antecessive (because), concessive (but) and connectives (and / with) shown 
in Figure 14. 



That 


If 


Because 


But 


And 


With 




*1- 


V 


X 







Figure 14. CAILS’s conjunctions symbols [Champoux, Fujisawa, Inoue, Iwadate, 2000] 

A system based upon a set of dynamic visuals with qualitative reasoning 
about information displayed within a document is known as Context Trans- 
port Mark up Language, CTML [Tonfoni, 1996,1998-2001]. 

An iconic communication system to assist a user to construct sentences, 
without typing them in words, i.e. solely relying on icons is called Visual 
Inter Language, VIL [Becker, 2000]. The goal of VIL is to make the system 
language independent so that it can be used universally. 
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5.4 Requirements for an efficient iconic system 

It is widely believed that an iconic language (IL) and an iconic system 
can be successful if: 

• IL is a specialized language for specific domain (such as music, 
math, traffic control, military, digital logic) with a built-in iconic reasoning 
mechanism. 

• IL is a language naturally growing in communications of people who 
use and spread it (e.g., growth of natural spoken and hieroglyphic lan- 
guages). This approach is advocated by Cruickshank and Barfiel [2000]. 

• Efficient learning procedures are established for IL (e.g., start from a 
small subset). 

• IL is a small language with a very few graphical elements/icons with 
multiple meanings depending on their location relative to other icons. 

For instance, in mathematics, line above word Urn has one meaning and 
line below Um has another meaning. This polysemantic contextual approach 
is called semantic compaction in [Cruickshank & Barfiel, 2000]. 

Other requirements identified are [King, 1999]: (1) typographic conven- 
ience, (2) ability to draw attention and interaction to a point, (3) ability to 
encapsulate a complex meaning in a simple and “on the spot” message, and 
(4) ability to support creation of a community of effective sign-users. 

Comprehensibility icons are still actively discussed, it is obviously desir- 
able but often is not considered as a necessity. If other icons that are less 
comprehensible can be simple manipulated, then such icons can survive. The 
history of mathematics provides many examples in support of this point. 



6. CONCLUSION 

This chapter presented the iconic evidentiary reasoning (lER) architecture 
with iconic storytelling visualization and an overview of the related work. 
lER intends to convey complex multi-step analytical reasoning and decision 
making in a more efficient and condensed way than a traditional text is able. 
lER is defined as a visual support mechanism for the following general prob- 
lem-solving step: defining the problem, generating initial alternatives for 
hypotheses that are called pre-hypotheses, linking pre-hypotheses with evi- 
dences, and generating hypotheses by evaluating pre-hypotheses against evi- 
dences. 

The architecture is applicable for intelligence analysis, engineering and 
architectural design, market analysis, health care, and many other areas. lER 
approach was presented using an example of ongoing monitoring of the po- 
litical situation in some fictitious country. As a result a compact single pic- 
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ture condenses all reasoning steps. This iconic picture combines indicates a 
conclusion, its status and a chain of evidences that support the conclusion. 
For presentation purposes the final part of the reasoning chain can be 
enlarged and icons can be replaced by actual maps, imagery and photographs 
of people involved. 

Major components of lER architecture are: (1) Collecting and annotating 
analytical reports as inputs using a markup language, e.g., XML, DAML; (2) 
Providing iconic representation for hypotheses, evidences, scenarios, impli- 
cations, assessments, and interpretations involved in analytical process; (3) 
Providing iconic representation for confirmations, and beliefs categorized 
by levels; (4) Providing iconic representation for evidentiary reasoning 
mechanisms (propositional, first-order logic, modal logic, probabilistic and 
fuzzy logics); (5) Providing scenario-based visualization and visual discov- 
ery of changing patterns and relationships and (6) Providing a condensed 
version of iconic representation of evidentiary reasoning mechanisms for 
presentation and reporting. 

Iconic reasoning is much shorter and perceptually appearing than text. 
This is important for communicating analytical results (including underlying 
reasoning) to decision-makers and fellow analysts. With these advantages at 
some moment iconic reasoning can become a major way of reasoning and 
communication in general. In this chapter, the overview of iconic studies 
contrasted iconic reasoning and iconic communication. The application area 
for iconic reasoning is problem solving and the application area for iconic 
communication is unrestricted human communication. The chapter provided 
a short overview of iconic terminology started by Charles Pierce. It is also 
discussed comprehensibility of icons along with the user-created icons vs. a 
fixed iconic vocabulary. The overview indicated that visual reasoning mod- 
els that use visual representations isomorphic to their logical models and 
hybrid models that combine visuals with logical representation often have 
computational advantages over reasoning models based solely on proposi- 
tional representations without visual components. The overview also briefly 
presented a history of iconic languages and ideas of more recent iconic lan- 
guages. 



7. EXERCISES AND PROBLEMS 



1. Develop visual reasoning rules similar to those presented in Figure 6. 
Tip: use OR and negation operations. 

2. Build visual search-reasoning rules similar to those shown in section 1.8. 
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3. Develop your own integrated visual evidentiary reasoning description 
similar to the scheme presented in Figure 11. Tip: Select a text from a re- 
cent mass media report and attempt to present it as a visual reasoning. 

4. Discuss requirements for an efficient iconic reasoning system based on 
considerations presented in section 3.4. 
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TOWARD VISUAL REASONING AND 
DISCOVERY 

Lessons from the early history of mathematics 



Boris Kovalerchuk 

Central Washington University, USA 



Abstract: Currently computer visualization is moving from a pure illustration domain to 

visual reasoning, discovery, and decisions making. This trend is associated 
with new terms such as visual data mining, visual decision making, heteroge- 
neous, iconic and diagrammatic reasoning. Beyond a new terminology, the 
trend itself is not new as the early history of mathematics clearly shows. In this 
chapter, we demonstrate that we can learn valuable lessons from the history of 
mathematics for visual reasoning and discovery. 



Key words: Visualization, visual reasoning, visual discovery, history of mathematics. 



1. INTRODUCTION 

In Chapter 1, visuals were classified in three ways: (1) illustration, (2) 
reasoning, and (3) discovery. Illustration demonstrates the essence of enti- 
ties involved and presents a solution statement without showing the underly- 
ing problem solving reasoning process. Reasoning sets up explanatory rele- 
vance of entities to each other and discovery finds relevant entities. These 
categories form the creativity scale shown in Figure 3 in Chapter 1 where 
illustration and discovery are the two extremes in this scale with many in- 
termediate mixed cases. Reasoning occupies the middle of this scale. In 
Chapter 1, all three concepts have been illustrated with the Pythagoras Theo- 
rem: (1) visualization of the theorem statement, (2) visualization of the proof 
process for the theorem’s statement, and (3) visualization of the discovery 
process that identifies the theorem’s statement as a hypothesis. 

In visual decision making the listed categories have their counterparts: 
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• visualization of a decision 

• explanation of a decision 

• visualization of the process of discovery of a decision. 

Computer visualization is moving from being pure illustration to reason- 
ing, discovery, and decision making. New terms such as visual data mining, 
visual decision making, visual, heterogeneous, iconic and diagrammatic rea- 
soning clearly indicate this trend. Beyond a new terminology, the trend itself 
is not new as the early history of mathematics clearly shows. In this chapter, 
we demonstrate that we can learn valuable lessons from the history of 
mathematics. The first one is that all three aspects had been implemented in 
the ancient times without the modern power of computer graphics: 

1) Egyptians and Babylonian had a well developed illustration system 
for visualizing numbers; 

2) Egyptians and Babylonian had a well developed reasoning system 
for solving arithmetic, geometric and algebraic tasks using visual- 
ized numbers called numerals; 

3) Ancient Egyptians were able to discover and test visually a non- 
trivial math relation, known now as the number 7t. 

How can we learn lesson from this history? How can we accelerate the 
transition from illustration to decision-making and problem solving in new 
challenging tasks we face now using history lessons? At first that history 
should be described in terms of visual illustration, reasoning and discovery. 
This will give an empirical base for answering posed questions. Traditionally 
texts on the history of mathematics have different focus. This chapter could 
be viewed as an attempt to create such an empirical base for a few specific 
subjects. 

The first lesson from this analysis is: inappropriate results of illustration 
stage hinder and harm the next stages of visual reasoning and decision mak- 
ing. Moreover, this can completely prevent visual reasoning and decision 
making, because these stages are based on visualization of entities provided 
in the illustration stage. 

The most obvious example of such a case is exhibited by Roman numer- 
als. These numerals perfectly fulfill the illustration and demonstration role, 
but have very limited abilities to support visual reasoning for arithmetic 
(summation, subtraction, multiplication and division). Hindu-Arabic numer- 
als fit reasoning tasks much better. 

The second lesson is that the most natural visualization that seems iso- 
morphic to real world entities is not necessarily the best for reasoning and 
decision making. The Ptolemy Geocentric system was isomorphic to the ob- 
served rotation of the Sun around the Earth, but eventually it became clear 
that it does not provide advanced reasoning tools to compute the orbits of 
other planets. 
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The third lesson is that if we want to design (invent) visualization that 
will survive reasoning and decision making tests later we should be able to 
formulate future reasoning and decision making tasks explicitly at the time 
when design of visualization as illustration is started. 

The forth lesson is that if we design (invent) visualization without a clear 
vision of future reasoning and decision making (problem solving) tasks the 
chance is no better than a flip of a coin that the visualization will survive 
reasoning and decision making tests later. 

It seems that the history of mathematics points towards the conclusion 
that most initial visualizations of numerals were invented for illustration, 
description and recording purposes. Their usefulness for reasoning and prob- 
lem solving was tested later. Those that survived, namely the Hindu-Arabic 
numerals, we use now. In essence, this history fits the idea of evolution with 
the survival of the fittest. 

In this chapter, we analyze the history of Egyptian and Babylonian nu- 
merals that support lessons learned presented above. 



2. VISUALIZATION AS ILLUSTRATION: LESSONS 
FROM HIEROGLYPHIC NUMERALS 

To provide an illustration it is sufficient to visualize concepts involved in 
the solution. Let us consider a simple arithmetic example, 3535+1717=5252. 
There is a justified computational procedure for getting this solution 5252, 
but the expression 3535+1717=5252 does not show the reasoning steps that 
lead us to the solution. 

In this example, concepts visualized are input, output, the summation op- 
eration and equality relation. Such visualization tasks were successfully 
solved in ancient Egypt, Greece, India and Mesopotamia by developing 
symbols for numbers and to some extent symbols for operations and rela- 
tions too. 



2.1. Egyptian numerals 

Hieroglyphic numerals. Table 1 shows Egyptian hieroglyphic numerals 
and some of their ideographic meanings [Allen, 2001a, Williams, 2002c, 
Aleff, 2003, Berlin, 2003]. 
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1 10 10 ^ 10 ^ 10 ^ 10 ^ 10 ^ 10 ^ 



1 n? 


1 


9 






/ \ 


vertical heal 
stroke bone. 

vault 


snare 
coil of 
rope 


lotus 

flower 


bent 

finger 


burbot 

fish. 

Tadpole 


kneeling figure 
with raised 
arms. Heh-god 


Sun 


Compare this with Roman and Hindu-Arabic numerals presented in Ta- 
ble 2. Romans had also other numerals: V for 5, L for 50, and D for 500. 


Table 2. Roman and Hindu-Arabic numerals 








1 10 


10^ 


10^ 


10^ 


10^ 


10^ 


10^ 


1 10 


100 


1000 


10000 


100000 


1000000 


10000000 


1 X 


C 


M 


X 


c 


M 





Table 3 shows some alternative design of hieroglyphs from sources listed 
above. Left and right forms were used on the left and right sides of the text 
to provide visual symmetric view of the wall. Several fonts have been devel- 
oped for hieroglyphs. Table 3 uses Gardiner, Glyph basic, and Nahkt fonts 
[Bertin, 2003]. 



Table 3. Symmetric pairs and alternative glyphs 



Number 


Left form 


Right form 


100 




s> 


1,000 ^ J 


10,000 ^ ^ 


100,000 







7. Toward visual reasoning and discovery 



157 



Hieratic numerals. The Egyptian Hieratic numeral system is also 10 
based. Hieratic symbols for 2 and 3 are repeated symbols for 1 ( | ), re- 
spectively (II, III ) and the symbol for 8 (~ ) is a repeated symbol for 4 



(-)■ 

The same symbols 1 1 and 1 1 1 are used in Hieroglyphic and Roman sys- 
tems for 2 and 3. Other unique symbols in the Hieratic system are [Fried- 
man, 2003b]: 



5-7,6= i,7-^,9.^,a„dl0-n 



This system also has unique symbols for 20,30,40,50,60,70,80,90 and 100. 
The symbol for 20 ( ^ ) is directly based on symbols for 10 (H), and the 
symbol for 40 ( ) is based on symbol for 4 ( — ). Symbols for 200, 300, 

400 and 500 are based on the symbol for 100 (-^) . They i 



are 



drawn by adding one, two, three or four dots (.) above, 200 (^), 

300 (- 

2003b]. 



• « 



),400(i:>),and500(^) [Friedman. 



In Table 4, we show how numbers were composed using base glyphs 
presented in Figure 8 using the idea of Hindu- Arabic decimal positional sys- 
tem that is a modern standard for number visualization. 

It is so common after several thousand years of use that we often miss the 
point that this is just one of the possible number visualization systems. It 
would be more transparent if we contrast it with a textual description of 
numbers. The textual descriptiou at most visualizes the sequeuce of 
souuds, e.g., the word “thousand” in phonetics based languages. 



Table 4. Egyptian hieroglyphs and operations [font from Williams, 2002c] 



Modem symbols 


= 


1=10° 


10 


10" 


10^ 


Egyptian hieroglyphs 


□ 


1 


n 


? 


1 


Modem 3105=5+100+3*10^ 
(backward notation) 




5 


0 


100 


3*10" 


Egyptian 3105 




mil 






III 
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In Egyptian numeral system, there is a flexible sequence and every deci- 
mal has its own symbol. Thus, some hieroglyphic numerals are shorter than 
in modern notation, for instance: 



The hieroglyphic system is less positional than the Hindu- Arabic system that 
we are using now. The change of the sequence of the components does not 
destroy the value of the number: 



but for the Hindu-Arabic numeral 10001 10 the backward sequence 0110001 
is not equal to 10001 10. 

The Roman system is an intermediate system between the Egyptian and 
the Hindu-Arabic systems. This system is more positional than the Egyptian 
system and less positional than the Hindu-Arabic system. At first glance. 
Tables 1 and 2 show that both Roman and Egyptian systems provide simpler 
visualization for numbers than the Hindu-Arabic system. However, only this 
system has survived as a reasoning and decision making tool for humans and 
was replaced by the Binary positional systems for computers only very re- 
cently. It is not clear which of these systems were developed first. It is most 
likely that all these systems were developed quite independently and then 
were tested for ease in solving mathematical tasks. It is possible that the 
Hindu-Arabic system is the oldest one. 

Egyptians actually used the flexibility of their system to present numbers 
in a variety of ways including several lines. Their numerals are truly two- 
dimensional (see Figure 1). Thus, Egyptians had a well developed illustra- 
tion system for visualizing numbers. 

Figure 1 shows that Egyptians wrote a single number using 2-3 lines 
starting from larger digits. Alternatively, they used a single line, where lar- 
ger digits are on the right because Egyptians wrote from right to left. The 
record for number 300003 shows also that Egyptians used a smaller size for 
smaller digits (all six ones “|” occupy the same space as a single glyph for 
100,000). Writing for 1/25 in Figure 1 shows that they also used a format 
where digits are distributed in two columns. Writing for 3350 shows that 
Egyptians also used a “zipper” type of writing, where two of o symbols are 
moved down. Sometimes Egyptians used a larger “font” for larger digits. 

The symbol for 1000 is twice larger that the symbols for other digits in 
the Egyptian 1303 shown above. The same idea is implemented in writing 
1010005, the digit for 1000000 is more than twice larger than digits for 
10000 and 5. Thus, every number is represented as a complex icon/glyph/ 
numeral that is combined from simpler icons (numerals) for basic digits. 



10001 




10001 
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This combining has its own visual syntax. The presentation for 19607 does 
not follow the pattern that larger digits are also larger in size. It shows digits 
for 1000 smaller because there are nine of them. Thus, a more general rule is 
that a large number of equal digits is the main factor for drawing those dig- 
its smaller. 



? 

nn 


?? G 

III? i 


??^? 
Dp, Don 


120 1303+1/11 3350 












300003 









1010005 



INK?? 

III??? 



19607 



Dll 



1/25 



?? 

nnn 

non 





276 



Figure 1. Free sequence of numeral components 
(based on [Friedman, 2003c; Williams, 2002c; Arcavi, 2003; Allen,2001a]) 



2.2. Babylonian numerals 



The Babylonians inherited the Sumerian style of writing on clay tablets. 
Their arithmetic was positional and sexagesimal based on 60 with symbols 
for 1, 10, 60, 600, 3600, 36000, and 216000. We follow a simplified notation 
from [Allen, 2001b] where V is used for 1 and -< for 10. In this notation 

7341 = WVV^^ V 

because 7341, o-222l6o-2»60^ + 2*60 + 21. 

Base 60 has many advantages. One of them is that other systems can be 
converted to this one (2, 3, 5, 10, 12, 15, 20, and 30 all divide 60) . 
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The Babylonian system permitted some shortcuts. Without the shortcut, 
number 19 is represented as -<VVVVVVVVV. Table 3 shows a shortcut for 
this number that uses a subtraction idea (19 = 20-1). This way the symbol for 
1 that is V is not repeated nine times [Allen, 2001b]. 



Table 3. Babylonian numeral 19 (based on [Allen, 2001b]) 



-<-<v 



19 is presented as 20-1, where 20 is 2..^ and 
1 is V. The negation symbol is over the sym- 
bol for 1 . 



V 


V 


V 


19 as 10+9, where symbol stands for 10 
and 9 small symbols V stand for 1 . 


> 

Y 


V 


V 




V 


V 


V 





This brief description shows that the Babylonians had a well developed 
illustration system for visualizing numbers. It was not limited to integers; 
Babylonians also used fractions. Their abilities for reasoning with numbers 
included extracting square roots, solving linear systems, using Pythagorean 
triples such as (3, 4, 5): 3^+4^= 5^, and solving cubic equations using tables. 
Several of these actions can be qualified as visual reasoning too. 

2.3. Results of arithmetic operations 

People in ancient Egypt knew how to visually represent results of arith- 
metic for both integers and fractions (see Figures 2 and 3). 



^wiiiin-nnnnniiiiiit^wnnnnnnii 

215 + 57 - 272 ~ 

Figure 2. Visual adding integers 







^1 

Dll 


^1 ^1 
Dll nil 


1/10 


1/10+1/10=2/10 


1/25 


1/25+1/25=2/25 



Figure 3. Visual adding fractions 



Egyptians also knew how to add numbers visually (see Figures 4, 5) 
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Adding visually by columns, 1700 BC, 
Egypt 


Adding 
visually 
column by 
column 


Adding using 
text (method 
does not exist 
yet) 


n " 

III 


215 


Two hundred 
fifteen 


nnn iiii 

nn III 


57 


Fifty seven 


nnnn ii 

nnn 


272 


two hundred 
seventy two 



Figure 4. Adding numbers using symbols vs. text 



Let us assume that we need to sum the numbers 215 and 57 written in 
words, two hundred fifteen plus fifty-seven. How can we do this using the 
words? A procedure to do this does not exist even after 4000 years of using 
numbers. The only method known now is converting verbal numbers to one 
of the symbolic (visual forms). 



1 




2801 




^(3(0 <3 < 


III 


IX — Q 


5602 = 2801*2 


1 


II ^ 

II ^(3 


[ 




11204 = 5602*2 




II ff 

II 






Sum 1 9607 = 
2801 +5602 + 11204 



Figure 5. Visual summation (based on Arcavy, 2002) 
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In the next section, we analyze the visual reasoning in Egyptian mathematics 
that is representative of the level of sophistication reached in the Ancient 
world in visual reasoning. 



3. VISUAL REASONING: LESSONS FROM 
HIEROGLYPHIC ARITHMETIC 

Reasoning includes procedures for getting results from arithmetic opera- 
tions. A proof that the operation is correct came from summation and multi- 
plication tables. In early Egypt, addition and subtraction were simple visual 
processes using the counting glyphs. To add two numbers, we collect all 
symbols of the same type and replace ten of them by one of the next higher 
order. For example, adding and subtracting 5 and 7 is shown in Table 5, 
where ten symbols | are substituted by a single symbol that means 10. 



Table 5. Visual arithmetic 



Addition 


Subtraction 


7 mil II 


, 


5 


5 


result mil mil II 


result 1 1 


compact result n 





Addition and subtraction of 35 and 17 are shown in Table 6 where numbers 
are presented in backward sequence of decimal positions. These operations 
use the same visual techniques: grouping and substitution. 



Table 6. Visual addition of 35 and 17 



Number (in modern notation) 


Decimal position 




1 (10“) 


n (10') 


35 


II III non 




5 


30 


17 


mil II n 




7 


10 


Sum 52 = (12 + 40) 


II III II III II 


nnnn 


12 


40 


Sum 52 with use “carry” 


II nnnnn 


(after shifting lO*' “|” to lO' position as fl) 


2 


50 



Similarly, Table 7 shows the addition of 70 and 10. 
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Table 7. Summation (with opposite sequence of decimal positions) 



Number (in modem notation) 


1(10") n (10‘) 


Sum 70 = 10 + 60 


n nn non 


10 610 


Sum 70 = 10 + 60 

(after shifting 10 “|” to lO' position as “0”) 


n nn nnnn 


0 710 



Multiplication and division were done by using visuals and a lookup mul- 
tiplication table for only In, An, 8« and so on for every n. Numbers in the 
table were generated by repeated visual summation such ^s, n + n = 2n then 
In + 2n = An and so on (see Table 7 for the number 25). For example, to 
multiply 25 by 1 1 the property 11 = 1+ 2 + 8 is used along with the two 
times multiplication table (Table 8): 25T1 = 25-(l+2+8) = 25 + 25-2 + 25-8 
(see Table 9). 



Table 8. Two times multiplication table for number 25 



1-25 


llllinn 


25 


2-25 


II nnnnn 


50=25+25 


4-25 


III! ? 


100=50+50 


8-25 


nil nil 


200=100+100 


Table 9. Multiplication 25T 1 


1-25 


II III ^ 


25 


2-25 


nnnnn 


50 


8-25 




200 


25-(l+2+8)=25-ll 


II III nn nnnnn 


275=25+50+200 


II 


Ill nn nnnnn nnnnn nnnnnn ?? 


275 




II III nn nnnnn 


275 



To get a feeling for the advantages of this visual process we just need to 
try to multiply 25 by 11 solely in a textual form. This simple task becomes 
very difficult to solve and even harder to prove that the result is correct. But 
the visual arithmetic computation process is not simple either if we try to 
record it completely. Below we show what happens when we add two num- 
bers, 35 and 17, using standard arithmetic techniques. We use a spreadsheet 
visualization similar to the one used in MS Excel. In Table 10, number 35 
occupies cells (1,3), (1,4) and number 17 occupies cells (2,3) and (2,4). The 
result should be located in cells (3,3) and (3,4). There are also cells (4,2) and 
(4,3) reserved to writing carries. We use symbols Oy for a content of cell (i,j). 
In this notation, the algorithm for adding 35 and 17 consists of two steps: 

Step 1: If ai 4 + a24>9 then a 34 := (a^ + a 24 ) -10 & a 43 := 1 
else a34 := ai4 + a24 & a 43 1 = 0 
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Step 2: If ai 3 + a23 + a 43 >9 then a 33 := (a^ + a23) -10 & a42 := 1 

else a33 i— ai3 -i- a23 a43 & a42 0 



Table 10. Visual summation 





j=l 


j=2 


j=3 


j=4 


i=l 






3 


5 


i=2 






1 


7 


i=3 


sum 




5 


2 


i=4 


carry 


0 


1 





At first glance, this algorithm does not appear to be visual, but the placement 
of numbers is visual, similar to what users do everyday working with the 
Excel spreadsheet graphical user interface. People accomplish these steps 
easily visually, but it is hard to explain steps 1 and 2 without visuals, al- 
though it is almost a complete computer program. 



4. VISUAL DISCOVERY: LESSONS FROM THE 
DISCOVERY OF K 

Ancient Egyptians were able to discover and test visually a non-trivial 
mathematical relation between the diameter of the circle, D and its area, S. 
Now this relation is expressed using the number 7t, S={n!‘\)D^. 

Egyptians discovered this relation for a specific diameter D=9 in the form 
5=(D-lf = (9-1)^ It can be generalized to S=(D-Z)/9f =(8Z)/9)^=(64/81)Z)^ 
We can compare the Egyptian coefficient 64/81=0.790123 with 7t/4= 
0.785398 and notice that the formula discovered in Egypt is remarkably ac- 
curate. Below we discuss this discovery in more detail. Problem 50 from the 
Rhind papyrus (about 1550 BC) [Williams, 2002b] is about this relation be- 
tween the diameter and the area of the circle: 

“A circular field has diameter 9 khet. What is its area? ” 

One khet is about 50 meters. This papyrus is now in the British Museum. Its 
detailed description is presented in [Robins, Shute, 1998; Chance at al, 
1927, 1929]. The papyrus was made by the scribe Ahmose and sometimes is 
called the Ahmose papyrus. The solution provided in the Rhind papyrus for 
problem 50 is [Friedman, 2003a] : 

You are to subtract one ninth of it, namely 1 : remainder 8. 

You are to multiply 8 two times: it becomes 64. This is its area in 
land, 6 "thousands-ofland" and 4 setat. 
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In modem terms, it means as we already mentioned: 

If diameter £)=9 then circle area 5'=(9-l)(9-l)=8- 8=64. 



In accordance with this formula, 7t number would be 3.160494 which is quite 
close to the correct value 3.141593. To get this value we just need to notice 
that5'=(9-l)^=64=7t(9/2)^ Thus, 7t=64/(9/2f = 3.160494. 

How was it possible to discover such an accurate result 3500 years ago? 
We follow William s ’ conjecture that it was done by means of visual discov- 
ery: “An alternate conjecture exhibiting the value of n is that the Egyptians 
easily observed that the area of a square 8 units on a side can be reformed to 
nearly yield a circle of diameter 9.” 

Figure 6 reconstmcts how such a discovery could be made. It shows a va- 
riety of circles and squares with small circles inside. The square 8 by 8 has 
64 small circles and the circle with diameter of nine small circles has 67 of 
those circles. These numbers 64 and 67 are similar with the difference less 
then 5% of each of them. The difference between any other pair of numbers 
in Figure 6 is greater, thus, it is discovered that the area of the square 8x8 
closely interpolates the area of the circle with the diameter 9. This mental 
experiment could be conducted in ancient Egypt physically with small river 
rocks, apples, or seeds. Ancient Egyptians could easily collect hundreds of 
these items of almost the same size. 




ooo 
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ooooo 



oooo 
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poq 
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loodool 

25 
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oooooooo 

ODOOOOPQ 



oo 

oo 

oo 

oo 



oooo 

opqo 
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64 



Figure 6. Visual experiment (adapted from [Williams, 2002a]) 
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Now we need to analyze how after discovering the similarity of the num- 
ber of rocks in some circle and square to discover the mathematical indica- 
tors to match the circle and the square areas. We already assumed that these 
indicators are the diameter (or radius) of the circle and the side of the square. 
It is a very realistic assumption. These indicators were already known to 
Egyptians as the main characteristics of a circle and a square. But we need to 
discover the relation between their squares, =7tR^ with an unknown coef- 
ficient 7t. 

In essence, this is a data mining task in modern terms. It could be tried in 
ancient times visually again by experimenting. For instance, rocks can be 
counted in the square that contained in the circle and in the square that con- 
tains the circle getting Si<S<S 2 . Here, Si is the number of rocks in the con- 
tained circle and S 2 is the number of rocks in the circle that contains the cir- 
cle with area S. Obviously this approach would give a very rough estimate of 
7t. A more accurate result can be obtained by interpolating a circle by smaller 
squares and counting rocks that are contained in the circle. For instance, we 
can interpolate the circle area by subtracting from the total area of the sur- 
rounding square (9-9=81) the area that is not in the circle. This is about a half 
of four small square 3x3 in the comers of the 9x9 square in Figure 7, that is 
4(3-3)/2, with the final result: 9-9 -2-3-3 = 81-18 = 63 that is close to 8-8=64. 
In essence, this is an octagon interpolation of the circle. 

This visual approach has been used in problem 48 of the Rhind papyms. 
In general, problems 41-43, 48, and 50 of that papyms are devoted to the 
circle area computation. Problem 48 of the Rhind Papyms states [Write, 
2000 ]: 

The area of a circle of diameter 9 is the same as the area of a square of 

side 8. Where does this come from? 

To justify statement (1) problem 48 contains a famous drawing of a square 
with an inscribed octagon: 

A a b B 
h c 

g d 

D f e C 

where ABCD is a square and abcdefgh is an irregular octagon [Gnaedinger, 
2001]. Problem 48 differs from problem 50 discussed above. Problem 48 is 
to justify of the result (reasoning, proof) of the already discovered statement 



AreaCircle(9)=AreaSquare(8), 



( 1 ) 
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where 9 is the length of the diameter of the circle and 8 is the length of the 
side of square. In contrast, problem 50 asks for discovery of the statement 
presented in (1). Figure 7 presents the inscribed octagon graphically. 



A a b B 




Figure 7. Illustration for the Rhind problem 48 (based on [Wright, 2000]) 

This simple method provides a reasonable approximation of n. As now 
known, this approximation idea permits one to get n with any desirable accu- 
racy by using an n-gon with large n. 



5. CONCLUSION 

Examples in this chapter illustrate the power of visual discovery com- 
bined with mathematical computations and reasoning. Below we summarize 
the characteristics of ancient arithmetic based on the analysis in this chapter: 

1 . Ancient arithmetic was visual. 

2. Ancient arithmetic involves explicit reasoning. 

3. There are exact reasoning rules how to operate with visual entities to 
obtain the result of an operation. 

4. Reasoning rules are specific for each arithmetic operation. 

5. Rules of more complex operations are based on rules for simple opera- 
tions (e.g., multiplication is based on addition and division is based on 
multiplication). 

The goal of each visual operation was well defined. 

Although formal models of the mathematical operations can be built, the 
actual Egyptian system is a mixture of visual intuitive procedures and formal 
manipulation with minimized double conversion between analytical and vis- 
ual representation of the problem (see discussion of this subject in Chapter 1, 
Section 1.5). If we compare many modem visualization tools with the char- 
acteristics presented above, we see that we have not reached the level of so- 
phistication known in ancient Egypt more than 3000 years ago. For instance, 
visual data mining does not go further then showing glyphs such as squares 
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and the rectangles from different viewpoints, but visual guidance how to ex- 
periment using visual tools for pattern discovery are not mature yet. 

In Chapter 16, we present a new visual data mining technique based on 
Monotone Boolean functions that intends to fill this gap for the Boolean data 
type. The technique permits discovering patterns by analyzing structural in- 
terrelations between objects (cases) in the original visualization and by 
changing the visualization to its modifications that permits one to see a con- 
tinuous border between patterns if it exists. 

To build an advanced visual discovery system we need to start with a 
clearly stated goal, as was done in the examples analyzed in this chapter, 
such as the goal of discovering a formula to compute the area of the circle. 
In modem visual data mining, the goal is discovering patterns. At first 
glance, it looks that we also have a goal, but this goal is not that well stated 
as computing the area. The correctness of the area computation can be tested 
using well-stated and simple criteria. Like data mining, the goal in imagery 
conflation (see chapters 17-21) is to find matching features. However, there 
are no natural, well-stated formal criteria to test if the goal is reached, even if 
people are able to match features by visual inspection of two images using 
informal, tacit mles. There are two important questions: 

(1) How can we know if a task can be solved by visual means? 

(2) How do we select tasks to be solved by visual means? 

The answer to both questions based on our analysis of early history of 
mathematics is: The task should have a goal and a formalized criteria to 
judge that the goal is reached, as was the case for these mathematical tasks. 
For less formal tasks, visual reasoning is still possible, as chapters (BK) and 
DB indicate, but the conclusions may be much less conclusive and the meth- 
ods may be less sophisticated. 

In Chapter 1 (Section 2.1) we discussed visualization, visual reasoning 
and visual discovery for the Pythagorean Theorem. The first two tasks were 
successfully solved visually many times (there are more than 300 different 
proofs of the theorem), but it is not the case for visual discovery of the theo- 
rem statement. It is difficult to formulate formal criteria for visual discovery. 
The goal can be formulated easily - to discover the theorem statement. 
However, we cannot assume that parameters and the types of relations 
(polynomial or other) between them are known if it is true discovery. Thus, 
the task should be formalized. This can be done in many different ways and 
the solutions can be quite different. The early history of mathematics clearly 
shows the trend from illustration to visual reasoning and discovery. This 
chapter demonstrates that we can learn valuable lessons from this history. 
The main lessons are: 

• inappropriate results at the illustration stage harm the next stages of 
visual reasoning and decision making; 
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• to invent a visualization that will survive visual reasoning and deci- 
sion making tests, reasoning and decision making tasks should be 
formulated explicitly when one designs visualization as illustration. 
Future work will use additional empirical information about the use of 
visual reasoning and visual discovery in ancient mathematics to analyze how 
to solve visually modern problems. 

6. EXERCISES AND PROBLEMS 

1. Compute visually 71 + 71 in the Egyptian hieroglyphic system using Ta- 
ble 11 for 70+70 as a prototype. 

Table 11. Hieroglyphic arithmetic for 70 



Number (in modern notation) 


Decimal position 

10° n io‘ ^ 10^ 


70 


nnnnnnn 

0 70 


70 


nnnnnnn 

70 


Sum 140= 10*14 


nnnnnnn 

nnnnnnn 

0 140 


Sum 140= 100 + 40 
(after shifting 10 0 to 10^ position as 


nnnn ? 

0 40 100 



2. Compute visually 145 + 145 in the Egyptian hieroglyph system using Ta- 
ble 12 for 140 + 140 as a prototype. 

Table 12. Hieroglyphic arithmetic for 140 



Number (in modern notation) 


Decimal position 




10° n 10* ^ 10^ 


140 


nnnn ? 

0 40 100 


140 


nnnn ? 

40 100 


Sum 280= 10*14 


nnnn 

nnnn 

0 80 200 



3. Compute visually 37 + 74 + 280 in the Egyptian hieroglyph system using 
Table 13 for 35 + 74 + 280 as a prototype. 
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Table 13. Hieroglyphic arithmetic for 35, 70 and 280 



Number (in modern notation) 


Decimal position 


10° n 10* ? 10^ 


35 


nnn 


5 3*10 


70 


nnnnnnn 


7*10 


280 


nnnn nnnn 


0 8*10 200 


Sum 385 = 5+ 180 + 200 


nnn 

nnnnnnn 

nnnn nnnn 
nnnn nnnn 


5 180 200 


Sum 385 = 5 + 80 + 300 
(after shifting 10 n to 10^ position as ^) 


nnnn nnnn 

nnnn nnnn 



4. Analyze efficiency of hieroglyphic visualization for arithmetic operations. 
Do you see any cases where the summation or multiplication using hiero- 
glyphic numerals can be accomplished faster than using the Hindu- Arabic 
numerals? Tip: Start from the cases presented in exercise 1-3. 
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VISUAL CORRELATION METHODS AND 
MODELS 



Boris Kovalerchuk 

Central Washington University, USA 



Abstract: This chapter introduces the concept of visual correlation and describes the 

essence of a generalized correlation to be used for multilevel and conflicting 
data. Several categories of visual correlation are presented accompanied by 
both numeric and non-numeric examples with three levels (high, medium and 
low) of coordination. We also present examples of multi-type visual correla- 
tions. Next, the chapter provides a classification of visual correlation methods 
with corresponding metaphors and criteria for visual correlation efficiency. 
Finally, the chapter finishes with a more formal treatment of visual correlation 
providing formal definitions, analysis, and theory. 

Key words: Visual correlation, heterogeneous data, visual data mining, information visu- 

alization, glyph, metaphor, classification, guidance, distortion, formalization, 
homomorphism, relational structure. 



1. INTRODUCTION 



1. 1 The motivation for a generalized correlation concept 

The purpose of visual correlation (VC) is to represent and discover the 
correlation between objects and events (0/E) visually. It has its own value 
for many applications and has significant potential to support decision mak- 
ing. Several complex questions need to be answered if visual correlation is to 
be implemented successfully: 

(1) How does one visually correlate non-numeric 0/E data? 
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(2) How does one visually correlate 0/Es with different levels of resolu- 
tion? 

(3) How does one visually correlate conflicting data for a given 0/E? 

(4) How does one visualize data for different categories of users? 

(5) How can an 0/E symbol be made "rich enough" to portray the differ- 
ences between 0/Es? 

To illustrate the problem and various approaches to it, we start with the 
non-traditional problem of correlating non-numeric 0/E data. One of the 
major challenges here is that often such 0/Es are represented by non-struc- 
tured or semi-structured text. One solution, a visual correlation system and 
visual language BRUEGEL described in Chapter 10, deals with this prob- 
lem for text that is tagged with XML tags. The system and language are 
named after Pieter Bruegel the Elder (1525-1569). The naming is not acci- 
dental. 

The visual correlation in BRUEGEL was inspired Bruegel’s famous 
painting “Blue cloak” (1559) shown in Figure 1. This paining is named after 
the Netherlandish (Flemish) proverb “She hangs a blue cloak (lies) around 
her husband”. This proverb is “visualized” in this painting and is marked by 
the center box in the picture below. 




Figure 1. Pieter Bruegel’s painting “Blue cloak” (1559), oil on oak panel, 117 x 163 cm. 
(with permission from Staatliche Museen zu Berlin - Gemaldegalerie, Berlin). See also color 
plates. 
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From the visual correlation viewpoint, the uniqueness of this painting is 
that Pieter Bruegel “compressed” and “visualized” a total of at least 78 
Netherlandish proverbs, maxims, rhymes and symbols [Foote, 1968]. 

In addition, visual correlation is supported by locating related proverbs 
in the same area of the painting. In this way Pieter Bruegel “visualizes” and 
conveys more information than can be provided by each individual proverb. 
Two proverbs, marked by the right-hand box in Figure 1, are shown in more 
detail in Table 1. The first two rows show these proverbs separately and the 
third row shows them side-by-side as Pieter Bruegel painted them. It is simi- 
lar to the modern concept of side-by-side visual correlation that we will 
discuss later in this chapter. 

The visual correlation of two proverbs reveals the deeper meaning of 
each proverb and their combinations. In essence, each painted proverb fulfils 
the role of an iconic summary of a complex concept. Their combination, as 
noted in the last line of the table, is a prototype for compound and com- 
posed icons that are further discussed in chapter 9. 



Table 1. Encoding text in art. See also color plates. 



Raw text 


Compressed content of the text 


Text metaphor proverb 


Image metaphor icon 


A greater power controls a 
smaller power with a brute force 
and violence 


Big fish eats little fish 




, ■' ■ 




A greater power controls a 
smaller power in a smart way with- 
out using a brute force and violence 


He catches fish with 
his hands 




■ • ■ ^ 


1 


A greater power controls a 
smaller power with a brute force 
while another greater power controls 
a smaller power in a smart way 
without using a brute force and vio- 
lence 


Big fish eats little fish 
while he catches fish with 
his hands 




.. 





Visual correlation is subject to two major interdependent challenges: (1) 
distortion of the actual relation and (2) excessive guidance necessary for the 
user to avoid distortion. Bruegel’s proverb example above illustrates this 
point as we discuss below. Distortion of the actual relation R(a,b) between 
objects a and b may have several sources: 
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• inappropriate and/or insufficient guidance for extraction of a re- 
lation R(a,b) from visualized data, 

• human misperception of the visualized data, and 

• human visual expectations. 

It is difficult to meet both challenges simultaneously. Less intensive 
guidance can increase misperception, specifically, the difference between an 
actual relation R(a,b) and a perceived relation Q(a,b) between objects a and 
b. In some cases, the actual relation R(a,b) can be lost completely because 
of misperception. Obviously, if we have no knowledge about the “Blue 
cloak” proverb and its meaning, we would not be able to recognize the rela- 
tion R(a,b) = “a is lying to her husband b”. 

Textual guidance can explain the proverb, but if every visual symbol 
needs an excessive explanation then visualization is not serving its role - to 
convey information faster than the traditional text-based form of communi- 
cation. 

In the following sections, we define concepts, review current visual cor- 
relation studies, and provide examples, a classification of VC methods, and 
criteria to assess the quality of visual correlation. 

1.2 Definitions of concepts 

Current studies on visual correlation range from formally defined, classi- 
cal linear correlation in statistics to very informally defined correlation be- 
tween natural language statements and images. 

Correlation can be defined as: 

• a tool for combining multiple observations of the same 0/E when 
data can be expressed in mathematical statistical form precisely or 

• a tool for combining measurements involving single or multiple phe- 
nomenologies (e.g.. Radar, Sonar) often in near-real time [Marsh, 
2000 ]. 

Correlation can be contrasted with data fusion and semantic data integra- 
tion as follows: 

• Fusion - a tool for combining information of very different types, 
such as sensor data, imagery, or human reports. It is often non-real 
time. 

• Semantic Integration - a tool to combining information where indi- 
vidual meanings and relationships can infer a larger meaning. It may 
involve some degree of contextual reasoning [Marsh, 2000]. 

In these terms, the Bruegel painting probably better fits into the semantic 
integration category. 

Other definitions of correlation include: fusion and integration under cor- 
relation umbrella. The core term used in these definitions is combining 0/Es. 
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From our viewpoint, the essence of correlation, fusion, and integration is 
generating new knowledge in the form a clearly established link or relation 
between different 0/Es. 

Classical correlation. We start our analysis with classical correlation 
and its visualization. Traditionally, correlation is expressed by using a cor- 
relation coefficient as a measure of association (interdependence) between 
two or more variables. A brief description of different correlation coeffi- 
cients used for numeric and binary variables is shown in Table 2. Measures 
of interdependence among several variables include: multiple correlation, 
marginal correlation, conditional correlation, canonical correlation, and auto- 
and cross-correlation for ensembles of measurements. 



Table 2. Correlation coefficients between numeric or binary variables 



Correlation coefficient (CC) 


Used to evaluate the relationship between 


Simple CC 


Two numeric variables 


Auto CC 


The same numeric variable at times Xj and A+t- 


Cross CC 


Two different data sets at different lags (a function of lag) 


Rank CC 


Two variables when the distribution of variables is not normal 


Point biserial CC 


Continuous variable and a binary variable 


Tetrachoric CC 


Two artificially dichotomous normally distributed variables 


Contingency CC 


Two nominal level variables. 



Generalized correlation. Data that need to be correlated are not limited 
by numeric or binary variables. For instance, we may need to correlate roads 
or drainage systems on the maps with imagery from different sources that 
may not have a common scale, may have different rotations and may have 
no common reference points. This task is commonly known as a conflation 
and is discussed in Chapters 17-21. 

There are also a variety of specialized correlations such as angular corre- 
lation and correlation for emitter error with confidence ellipses and time for 
identifying emitter locations using information from a radio direction- 
finding system [Mikulin & Elsaaesser, 1994]. 

The mathematical definition of correlation assumes that variables are 
specified in advance and a procedure for testing the significance of relation- 
ships between variables is also given in advance and expressed as a mathe- 
matical formula. In mathematical approach to correlation, we first observe 
(discover) some relationship between variables. The result is not the rela- 
tionship itself but a conclusion about its statistical significance. That is, the 
relationship is reproducible. 

Next, this approach can be applied to the situation where (i) objects and 
events are described by a few numeric variables with (ii) a large amount of 
values available. Flowever, this situation is quite different from one where 
recorded 0/Es represent social events such as terrorist activities where (iii) 
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objects and events are represented by a large number of non-numeric de- 
scriptors (textual or multimedia-based) and (iv) very few records for each 
individual descriptor are available. 

Indeed, this difference makes the direct use of classical correlation meth- 
ods almost irrelevant for tasks such as analysis of terrorist activities. To 
make classical correlation relevant, an intensive (and non-trivial) preprocess- 
ing is needed to generate a large number of values of a few numeric vari- 
ables. Note that this process is very task specific and may not scale well. It is 
obvious that the success and generality of such an endeavor may be ques- 
tionable. This consideration shows that a generalized concept of correlation 
that can handle complex objects and events is very desirable and not trivial. 



1.3 Visual correlation categories 

Current visual correlation practices and methods go beyond classical cor- 
relation and vary in their level of the exact presentation of correlation to the 
user. We distinguish the following presentation levels: 

• High level. A system identifies exact and clear correlation between 
0/Es in advance as a part of the design stage. 

• Medium level. A system does not correlate objects in advance, but 
provides a user with interactive tools such as a curve-matching cur- 
sor. 

• Low level. A system mostly relies on human perception, providing 
similar graphical or multimedia presentations of correlated entities 
and some pointing mechanisms. 

Another important feature of a VC environment is level of knowledge of 
correlation. Is the correlation already known or has it been previously dis- 
covered? In classical visualization, it is typically assumed that the correla- 
tion has already been discovered. In visual data mining, it is assumed that 
correlation needs to be discovered. Thus there are two major categories of 
visual correlation: 

(VCl) Classical visualization — visualization of existing correlations be- 
tween objects and events and 

(VC2) Visual data mining — a process of finding correlations visually 
between 0/Es. 

For example, in VCl, we assume that a correlation such asy = v - 100 be- 
tween human height in cm (x) and weight in kg (y) has already been found 
and is then visualized as a plot. On the other hand, in VC2, the correlation y 
= X - 100 should be discovered visually from a plot. Clearly, one can also 
imagine spectrum of possible combinations of VCl and VC2. Both VC 
types can be combined with different levels of exact definition and presenta- 
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tion of correlation. The chance of building specific combinations are illus- 
trated in Table 3 with more smilie faces, © , indicating a better chance. The 
best chance of building a successful visual correlation system arises when 
the relations to be visualized are known and the system relies mostly on a 
human perception and provides similar graphical or multimedia presentation 
of related 0/Es along with some pointing mechanisms. This is depicted with 
a high value of five faces. The most challenging task is to build a system 
that would be able to present correlations visually that are not known yet and 
need to be discovered from given data. This option is shown as an empty cell 



in Table 3. 

Table 3. Types of visual correlations 



Level of correlation knowledge 


Level of exact presentation of correlation 


High 


Medium 


Low 


Correlation is already known 








(Classical Visualization, VCl) 


© © © 


© © © © 


© © © © © 


Correlation is not known yet 








(Visual Data Mining, VC2) 




© 


© © 



The goal of visualizing known relations (VKR) is assisting an unaware 
individual. The goal of discovering unknown relations visually (DRV) is 
assisting everybody, because everybody is unaware in this case. For instance, 
if the mathematical linear relation y = ax + b known, then a specific igno- 
rant individual will benefit from its visualization as a straight line plot. Oth- 
erwise, if this relation is not previously known, everybody would benefit. 

The fundamental difference between VCl and VC2 is in the level of pos- 
sible guidance. For a known relation, it is possible to guide an ignorant per- 
son to see the relation through visualized data. The situation is quite different 
for unknown relations. Who can guide the understanding of the unknown 
relation? Thus, the classical visualization task, VCl is more specific and 
should be analyzed first. The progress in technology for this task will also 
help form a solid base for visual data mining, VC2. 



2. EXAMPLES OF NUMERIC VISUAL 

CORRELATIONS 

2.1 High-level numeric visual correlation 



In this section, we provide examples of high-level visual correlation for 
numeric data that require little guidance for a perceiver. In all examples, it is 



182 



Chapter 8 



assumed that the relation to be visualized is already known and available for 
the system. The software system computes and visualizes the relationship in 
a single VC panel based on user’s data. The user’s role is relatively passive 
and involves evaluating the VC without generating alternative visual correla- 
tions. 

Our first example is classical Linear or Curvilinear correlation using 
Cartesian coordinates as shown in Figure 2(a). Initially, it may seem that 
the relation to need not be known in advance; that it can be discovered visu- 
ally by observing the plot. While it is true that the plot can be used for dis- 
covery, that is a different role - not communicating a discovered relation to 
an ignorant person. In general, the same visualization may or may not serve 
both functions. 




(a) Linear and Curvilinear (b) Parallel coordinate correlation 
Cartesian correlation plot of five variables, 

plots of two variables. 



(c)Cartesian correlation 
of two numeric vari- 
ables vs. time. 



Figure 2. Cartesian and Parallel correlation plots for homogeneous numeric variables 

The next example is a Parallel coordinate plot [Inselberg, 1997], that 
uses Cartesian coordinates differently by locating them in parallel vertically. 
Figure 2(b) shows five parallel coordinates Xi, X2, X3, X4, and Xj. In parallel 
coordinates, every 5-dimension 0/E such as (0.5, 0.75, 1.0, 0.75, 0.5) is not a 
point in a 5-dimensional Cartesian coordinates but rather a line connecting 
values. Object (0.5, 0.75, 1.0, 0.75, 0.5) is shown in Figure 2(b) as the top 
dark line. Cartesian visualization is limited by 2-D and 3-D but parallel co- 
ordinate visualization in a truly multidimensional. Figure 2(b) shows 5 coor- 
dinates and we still have space to put five more coordinates. This visualiza- 
tion is efficient if 0/Es have well-distinguished clusters such as the “convex” 
and “concave” clusters shown in Figure 2(b). 

Figure 2(c) displays a typical correlation of time series. Each time se- 
ries is a 2-dimensional 0/E. The visual correlation of two 2-dimensional 
0/Es with a shared domain (usually time, t) is possible by overlaying two 
Cartesian. The time series correlation presented in Figure 2(c) can also be 
viewed as a parallel coordinate correlation. It is sufficient to consider x{t) for 
t = 1, 2, . . ., « as a set of separate variables x,: X/, X 2 ,. . ., x„. 

This differs from typical parallel coordinate use where m is much grater 
than n, m » n, with m the number of 0/Es. In time series correlation it is 
just the opposite, n » m. That is, we correlate 2-3 time series with hundreds 
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of time measurements. This leads to another reason for visualizing differ- 
ently. Having hundreds of parallel time coordinates Xt, the space between 
them is very small and often there is no need to draw lines connecting x, and 

Xt+i. 

Note that all thereof the visualizations presented could also used for dis- 
covering relations between 0/Es. 

2.2 Medium-level numeric visual correlations 

With medium-level VC, an exact correlation match is not provided and, 
using the visualization, the user needs to be able to explore and discover an 
appropriate relation. 

For example, a VC might present a user having many variables, Xi, X2,..., 
x„ with many correlation plots of different xj and Xj pairs. The user would 
then be able to identify potential correlations from the plots. These plots are 
typically organized on a grid. Figure 3 display an example where several of 
the plots exhibit clear correlation. 

In the VC, some correlations would be left for the user’s perception, rec- 
ognition, and discovery, while other correlations would be pointed out ex- 
plicitly. Clearly, the software would visualize n^ pairs when n variables are 
used. The user’s role in the VC is active as the user is responsible for select- 
ing variables for the VC, performing evaluation in the VC, and selecting in- 
teresting relations out of those presented. 




Figure 3. A grid panel of the pairwise correlation for four homogeneous numeric variables 
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Figure 4 shows another visual correlation method known as a Table 
Lens [Rao & Card, 1994, 1995; Pirolli & Rao, 1996]. This is a table of indi- 
vidual distributions of numeric variables shown side-by-side. Users are also 
active in this VC. In the Figure 4, the user can visually correlate distributions 
by noticing that first and forth distributions are similar. 



X, X2 Xj X4 X5 




frequency 

Figure 4. Table Lens: a table of individual distributions of numeric variables 

The next visual correlation method we turn to is a 3-D Visualization 
Spreadsheet with multiple visualizations. Here rows represent different ob- 
jects or the same object at different times and columns represent alternative 
visualizations for each object. One way alternative visualizations can be 
produced is by showing the object from viewpoints that might reveal differ- 
ent aspects of the object and thus permit correlation. Indeed, if the object 
represents an actual 3-D object then the projections have a physical interpre- 
tation. 

A more general correlation option arises when the object displayed is a 
glyph that encodes attributes of some other object. As an example, suppose 
the object to be represented has six numeric variables x/, X2, ..., X5. These 
variables could be encoded as characteristics of a visual glyph, here a pyra- 
mid, as follows: 

X/ - height of the pyramid, 

X2 - width of the pyramid, 

X 3 - color of the side 1, 

X 4 - color of the side 2, 

X5 - color of the side 3, and 
Xfi - color of the side 4. 

In the VC, every cell, row, and column can be graphically manipulated 
simultaneously. Columns might also encode very different characteristics. 
For instance, columns 1 and 2 might present glyph while columns 3 and 4 
can present bar charts of statistical distributions of the parameters. 

This idea was used in [Chi, 1999] for finding interrelationships and de- 
pendencies among variables for a pattern recognition problem and for pro- 
tein analysis. 
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Figure 5 demonstrates the 3-D spreadsheet with an object measured at 
three different times encoded as a glyph with 20 variables. Each column 
shows five of these variables. 



Time tj 



Time tj 



Time tj 



Figure 5. A 3-D multiview visual correlation spreadsheet 

Dynamic Interactive Pointers. In this visual correlation method the sys- 
tem designer provides software for displaying and interactively linking two 
side-by-side panels. The user correlates these panels with a curve-matching 
cursor. Color strips between the links correlates objects (see Figure 6 ). This 
idea has been used in geology for spatial interwell correlation as an exten- 
sion of the method of respective correlation in geology [Flaites, 1963]. 
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Figure 6. Dynamic Interactive Pointers 



Dynamic object visnal correlation. This VC permits a user to explore 
the dynamics of an object encoded as a shape (glyph) that represents n vari- 
ables of the object at a given time t. Dynamic changes are presented using 
pointers for successive times such as t+\, t+2. Similar dynamics may be used 
for spatial movement of an object from location Si to successive locations 
such as S 2 , S 3 . Figure 7 depicts this VC. 
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Figure 7. Dynamic object visual correlation 



2.3 Low-level numeric visual correlations 

In the low-level VC, a user visually correlates objects. Such correlation 
may be unstable because it depends on the scaling of numeric variables. 
Scaling may relocate and change visual objects used to represent and dis- 
cover patterns. In 3-D Glyph correlation each object is described as n- 
dimensional string of numeric attributes that are mapped into 3-D boxes or 
glyphs in a single panel. Colors, sizes, orientations, and shapes of boxes are 
used to represent numeric attributes. The system designer provides soft- 
ware, which visualizes this data. The user selects variables for VC and visu- 
ally correlates objects (boxes) using natural perception (see Figure 8). 

Shape glyphs. Every shape encodes several parameters of the data via 
its color, height, width, rotation, and shape type. This approach has been 
used with a variety of tasks where each object can be represented by an indi- 
vidual shape glyph. 




Figure 8. Examples of low-level visual correlations based on glyphs. See also color plates. 
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2.4 Examples of multi-type visual correlation 

Pointer method. Multi-type and multi-source heterogeneous visual cor- 
relation is needed when we try to correlate objects with non-numeric attrib- 
utes, with a mixture of numeric and non-numeric attributes, and with attrib- 
utes from different layers of attribute hierarchies. 

Figure 9 depicts multi-source heterogeneous example of high-level visual 
correlation, where image and text descriptors are correlated by pointers and 
commands are correlated by posting their list side-by-side with an image. 



Conceots 




Visual case 


Concept 1 ^ 




Concept 3 O 
Concept 4 
Concept 5 
Concept 6 











Commands 

Command 1 
Command 2 
Command 3 
Command 4 
Command 5 



Figure 9. Pointer-based visual correlation for multi-type data 



A system designer can link several heterogeneous panels in advance, for 
example: an image, text descriptors, and commands. A user would then ap- 
ply pointers to correlate text concepts with the visual image. This idea was 
implemented in [Novak, 1995] 

Rainbow correlation. Figure 10 demonstrates a multi-source heteroge- 
neous high-level visual correlation. Entities (documents, people or concepts) 
are represented as small dots on a two-dimensional plane connected by col- 
ored arcs above and below the plane. Relationships are shown by the loca- 
tion of the dots, arcs and their colors [Fletzler, Flarris, Flavre & Whitney, 
1998]. 




Metaphoric spatial visual correlation. In this correlation multidimen- 
sional data (objects, events) are represented in 2-D or 3-D as points in spe- 
cific locations using variety of dimension reduction techniques. Related 
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0/Es are located nearby or as “galaxies.” Users can list locations and events 
of interest and then use the correlation tool to quickly identify an events’ 
location. This idea is implemented in self-organized maps (SOM) initiated 
by Kohonen via techniques called Galaxies and ThemeView [Hetzler & 
Miller, 1998]. Figure 1 1 depicts this correlation method. 




Figure II. Metaphoric spatial visual correlation methods for multi-source data 



Geospatial visual correlation. Heterogeneous geospatial data are corre- 
lated using a variety of methods, one of them is known as the Magic Lens. 
In this method the user selects an area for magnification and the system re- 
veals magnified objects using different display metaphors and layouts using 
query and text modes. See Figure 1 2(a). Typical magic lens systems support 
hierarchical views, see for example [Shaffer & Reed, 1999]. 

Linked panels is another widely used method of visual correlation. See 
Figure 12(b). Panels of different levels are linked by an inserted rectangular 
for a region and by a pointer for a country. 





(a) Magic lens (b) Linked panels 

Figure 12. High-level geospatial visual correlation for objects of different levels of resolution. 

See also color plates. 



A variety of multi-source medium- and low-level visual correlations 
are also in use. Customizable, coordinated, side-by-side panels (Snap- 
Together Visualization) are among them. The user visually correlates con- 
tents of several panels. Users query their relational database and load results 
into a desired visualization. They then specify how to coordinate the various 
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visualizations when selecting, navigating, or re-querying. Some correlations 
are left to user’s perception, recognition, and discovery. Some correlations 
are pointed out explicitly [North, 2000]. See Figure 13(a). 




Figure 13. Medium- and low-level visual correlations of heterogeneous data 



Similarly, another side-by-side visual eorrelation for heterogeneous 
data has been implemented by combining graphics and images. Working 
with a setup such as Figure 13(b), a user could correlate an image of points 
with the portrait by recognizing the inventor of Cartesian coordinates with an 
image of those coordinates. The correlation is carried out by a comment such 
as “Descartes R., 1596-1650 and a visual correlation in Cartesian coordi- 
nates.” 



3. CLASSIFICATION OF VISUAL CORRELATION 
METHODS 

Metaphors for visual correlation tasks. Appealing to common knowl- 
edge via metaphors for discovering and displaying correlations may both 
speed up and make visual correlation easier. Several examples of this type 
metaphor were shown in the previous section, recall rainbow correlations. 



Table 4. Metaphors for visual correlation tasks 



Task 


Metaphor 


Familiar knowledge 


Visual correlation 


Spreadsheet of icons 


Table, Kids’ puzzle cubes 




3-D trees of icons 


Orchard work 




Room with floor, walls, and 
ceiling displaying parts of the task 


Spatial structure of building 


Learning visual 
correlation tools 


Travel in iconic world 


Tours, guides, navigation 


Collaborative visual 
correlation work 


Multi-agents 


Travel agents 
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Table 4 lists more metaphors that can be used in visual correlation tasks. 
Two of them were implemented in Bruegel project (see chapter 10); these 
were spreadsheet of icons and 3-D trees of icons. 

Classification. Chi & Riedl [Chi & Riedl, 1998] offer a classification 
scheme for data visualization methods. This is an “internal" classification 
based on operations for transforming data into the visual form. 

It provides a unified description of many well-known data visualization 
techniques such as: Dynamic Querying [Ahlberg & Shneiderman, 1994], 
AlignmentViewer [Chi, Riedl, Shoop, Carlis, Retzel & Barry, 1996], Paral- 
lel Coordinates [Inselberg, 1997], SeeNet [Becker, Eick & Wilks, 1995], 
ThemeScape and Galaxies [Wise, Thomas, Pennock, Lantrip, Pottier, 
Schur & Crow, 1995], Hierarchical Techniques : Cone tree [Robertson, 
Mackinlay & Card, 1991], Hyperbolic Browser [Lamping, Rao & Pirolli, 
1995], TreeMap [Johnson & Shneiderman, 1991], Disk Tree [Chi et ah, 
1998], Perspective Wall [Mackinlay & Robertson, Card], WebBook and 
WebForager [Card, Robertson & York, 1996], Table Lens [Rao & Card, 
1994, 1995], Time Tube [Chi et al, 1998], Spreadsheet for Images 
[Levoy, 1994], FINESSE [Varshney & Kaufman, 1996], Spreadsheet for 
Information Visualization [Chi, Barry, Riedl & Konstan, 1997]. 

Visual correlation is in need of an “external” classification scheme that 
reflects the goal of supporting the correlation of complex objects and events. 
This means that we would classify methods based on how they present 
correlated objects to a user and less on how the visualization has been 
obtained from original data. 

Examples presented in the previous section served as the basis of the 
classification system shown in Tables 5 and 6. We distinguish types, sub- 
types and individual visual correlation methods. Here some types are the 
same as individual methods. 



Table 5. Classification of visual correlation methods: simple structures 



VC Type 


VC subtype 


Visual Correlation method 


Single panel 


Points and lines for a single 
dataset 


Linear correlation plot 


Curvilinear correlation plot 


Points and lines for multiple 
datasets 


n visualized entities in the single panel. 


Glyphs 


2-D glyphs correlation 


3-D glyph correlation 


Mix of 2-D and 3-D glyphs 


Line of panels 
Side-by-side 


Static pointers 


Panel contents linked by pointers 


Dynamic interactive pointers 


User sets up links interactively 


n panels side-by-side 


n abstract visualizations side-by-side. 


n real-world pictures side-by-side (e.g., n 
X-ray films side-by-side) 
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Table 6. Classification of visual correlation methods: complex structures 



VC Type 


VC subtype 


Visual Correlation method 


Tree of panels 


Vertical tree of panels 


Vertical tree of panels 


Horizontal tree of panels 


Horizontal tree of panels 


Centered tree of panels (root 
in the center) 


Centered tree of panels (root in the center) 


Grid of panels 
(spreadsheet) 


Table of nxn panels. 


Grid of correlation and distribution plots 


Network 

of 

panels 


Static pointers 


Static pointers 


Dynamic interactive pointers 


Dynamic interactive pointers 


Side-by-side panels 


Side-by-side panels 


Nested 

panels 


Nested panels for hierarchi- 
cal views 


Nested geographic maps and events 


Panels in 3-D 


Mountain panel 


Mountain panel 


Fish eye 


Fish eye 


Room 


Room 


Gallery 


Gallery 


Cone/disk tree 


Cone/disk tree 


Zooming and 
popping np 
panels 


Standard zooming 


Geographic map zooming (2D or 3D) 


Zooming with changing 
metaphors and layouts 


Magic Lens (2D or 3D) 


Panels spread 
over several 
monitors 


Combinations 


Combination of above listed methods 



4. VISUAL CORRELATION EFFICIENCY 

Visual correlation shares many efficiency criteria with visualization in 
general. Development of such measures is an important part of visualization 
theory. The problem of measuring information density score (IDS) for visual 
systems is highly nontrivial while the problem of measuring information 
density for text is well known and has been studied for a long time. 

Claude Shaimon described a measure of information, known as informa- 
tion entropy, applicable to transmission of information using communica- 
tion channels. In visual correlation, we have a specific communication chan- 
nel that of transmitting information from a computer to a human. 

Shannon’s approach was developed further by [Yang-Pelaez , Flowers, 
2000] at MIT as a measure of information content for quantifying the rela- 
tive effectiveness of displays in different visualizations. We build on this 
approach by developing criteria that may be specifically applied to the 
evaluation of the visual correlation efficiency. The criteria are presented in 
Tables 7 and 8. Table 7 presents time, speed and information density charac- 
teristics and Table 8 considers their relative characteristics. 
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By comparison, the major characteristics of visual correlation are text 
correlation time (TCT) and text correlation speed (TCS), which evaluate the 
time and speed of capturing the correlation of 0/Es presented in text form. 



Table 7. Base characteristics of visual correlation efficiency 



Characteristic 


Description 


Comment 


Visual 
correlation 
time (VCT) 


Time for catching correla- 
tion visually. 


If VCT is relatively small then VC can be 
used in time-critical and/or information 
overloaded tasks 


Information 
density score 
(IDS) 


Amount of information pre- 
sented visually. IDS is 
measured for each separate 
panel and screen and as a 
sum of them (integral meas- 
ure). 

Amount of information can 
be measured in bits or bytes, 
Kb, Mb, and Gb. 


If IDS is relatively large then VC can 
handle large applications. 

Visualization and visual correlation can be 
viewed as a specialized data compression 
method (only a human can work with this 
compressed information). IDS is an indi- 
cator of the efficiency of such compres- 
sion. 


Speed of VC 
(SVC) 


SVC=IDSA^CT - amount 
of information consumed 
per time unit. 


If VCS is relatively high then VC can be 
used in time-critical and information over- 
loaded applications 



The source of visual correlation efficiency as compared to the correlation 
text is the higher speed of visual parallel information processing (PIP) when 
compared with the sequential nature textual analysis. Thus, the speed of vis- 
ual correlation (SVC) ties in with the speed of parallel information process- 
ing (SPIP). 

Computing SPIP may require measuring the amount of information proc- 
essed per time unit in both visual and textual 0/E correlation (that is indi- 
cated by information density, IDS). However, one can still measure relative 
time in experiments without explicitly measuring information density of the 
VC. 



Table 8. Comparative criteria for visual correlation efficiency 



Characteristic 


Description 


Comment 


Relative VC 

time 

(RVCT) 


Time of visual correlation (VCl) 
relative to another visual correlation: 
RVCT= VCT1/VCT2; 
RVCT=VCT1/TCT. 


If RVCT is relatively low, then 
VC can be used in time-critical 
and information overloaded 
tasks. 


Reiative VC 

speed 

(RVCS) 


Speed of visual correlation (VCl) 
relative to another visual correlation 
(VC2) or text TCS: 

VCS1/VCS2; VCSl/TCS 


If RVCS is relatively high then 
VC can be used in time-critical 
and information overloaded 
tasks. 
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5. VISUAL CORRELATION: LORMAL 

DELINITIONS, ANALYSIS, AND THEORY 

Challenges. Earlier, we began the discussion of visual correlation chal- 
lenges including (1) distortion of the actual relation R to be visualized and 
(2) excessive guidance required from the user to avoid distortion in Section 
1 using by referring to the famous painting of Pieter Bruegel (see Figure 1). 
Often it is impossible or very difficult to meet both challenges simultane- 
ously. Less intensive guidance can increase misperception, but intensive tex- 
tual guidance may mean that visualization is not serving its role; namely 
conveying information more efficiently than using traditional text for com- 
munication. In what follows, we provide a more formal discussion of this 
subject. 

5.1 Visualization of known relations 

Definitions. Visualization of a known relation R(a,b) between 0/Es a 
and b involves two components: 

• a visual representation V(R(a,b)) of relation R(a,b) and 

• the relation Q(a,b)=P(V(R(a,b))) perceived by a person from the vis- 
ual representation V(R(a,b)) . 

These components form a natural sequence from relation R(a,b) between 
0/Es a and b to visual representation ofR(a,b), and to relation Q(a,b) that is 
percived by person in the process of observing a visual representation of 
R(a,b). This sequence can be presented in a compact symbolic form: 

R(a,b) Av V(R(a,b)) Ap Q(a,b) (1) 

where Ay means produced by a visualization design tool, Ap means 
produced by a visual perception mechanism and where Q(a,b)=P(V(R(a,b))). 

Ideally, we should have Q(a,b) = R(a,b). That is, the perceived relation 
Q(a,b) should be the same as relation R(a,b): 

R(a,b) Ay V(R(a,b)) Ap R(a,b) (2) 

Property (2) is the ultimate goal of visualizing known relations. Formulas (1) 
and (2) encode two steps: 

(51) : Produce a visual representation of the relation R\ 

R(a,b) Ay V(R(a,b)) 

(52) : Perceive the visual representation of the relation R: 
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V(R(a,b)) Q(a,b). 

Both steps SI and S2 can produce corruptions of the relation R(a,b) and fi- 
nally produce a relation Q(a,b) that differs significantly from R(a,b)\ that is, 
Q(a,b)^R(a,b). 

Distortion. Below we provide an example of relation distortion. Let us 
consider the simple relation between two numbers a=2 and b=4: R(2,4)=“the 
number 2 is two times smaller than the number 4.” Figure 14(a) visualizes 
this relation by matching a and b with the radii of the circles so that Rt= 2 Ra. 
In this case, the relation Qj to be perceived is as follows: 

Qi(a,b)=tmQ O rt=2ra. 

Figure 14(b) visualizes the same relation by presenting two circles with areas 
Sa and Sb where the first area is half the second area, St=2Sa. In this case, 
the relation Q 2 is quite different: 

Q 2 (a,b)=tme <f> Sb=2Sa- 

that is Q 2 (a,b)=tme k (rbf=2Tr(rc)^ 

o O o O 

(a) (b) 

Figure 14. Radius and area visualization metaphors 

On the other hand, suppose we derive Q 2 from Qi. Since rb=2ra in Qj, we 
would have area Sa = nrf and area Sb = tz (rtf = n(2ra)^ = 4n(ra)^ =4Sa- 
Thus, the relation rt=2ra for radii is equivalent to the relation St = 4Sa when 
converted to areas. This is double the relation originally expressed by Q 2 - 
Without guidance, a person does not know what relation to use, Qj or Q 2 , 
for comparing alternatives. While the radius relation is a correct visual 
representation for relation R(a,b), without guidance, a person may compare 
areas Figure 14(a) even without consciously noticing it. As a result, the per- 
son might conclude that a is four times smaller than b. This is neither rela- 
tion Qi or Q 2 but rather a third relation Q 3 

Q 3 (a,b)= true <f> Sh=4Sa 

Translated into a relation this will produce an incorrect statement namely the 
number a is one quarter of the number b. 
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Surely then, Figures 14(a) requires guidance so that radii of the circles 
are compared. Without this guidance, areas might be compared in which 
case the actual relation R(a,b) would not be discovered. Indeed, even with 
correct guidance, a person may still not be able to extract the relation R(a,b) 
precisely because of misperception. 

Misperception. The human misperception of a characteristic such as 
area or radius may cause that the corresponding relation, such as Sb=2Sa be- 
tween areas of two circles, not to be recognized. In fact, psychological stud- 
ies [Tufte, 1983] have shown that the perceived area of a circle probably 
grows somewhat more slowly than the actual (physical, measured) area: 

the reported perceived area = (actual areaf'^~°'^ . 

Line length perception is another known area of Fluman misperception. 
Perceived length depends on the context and what other people have already 
said about the lines [Tufte, 1983; Asch, 1956]. The concept. Lie Factor (LF), 
was introduced to measure misperception: 

LF= {size of effect shown in graphics)! {size of effect in data) 

with limits (LF< 0.95, LF>1.05) for substantial distortion [Tufte, 1983]. 

Next, we illustrate misleading visual expectations in terms of visualized 
relation R(a,b) between two data sets a={x} and b={y} where for every x, 
y=2x. Figure 15(a) visualizes this relation while preserving proportion. Such 
a visualization permits the relation y=2x be discovered. Alternately, Figure 
15(b) shows the same relation but with inconsistent, disproportional axes 
since units on the y-axis are a quarter of the size of units on the x-axis. 




(a) (b) 

Figure 15. Visualization size effect 



Thus, the plot might create a misleading visual expectation, 2y=x, of a 
slowly growing ofy instead of 2x=y. Note that axes are correctly marked; 






1 96 Chapter 8 

however, a user expecting a uniform scale perceives a much slower growth 
than is actually depicted. 

Misleading visual expectations frequently arise from disproportional data 
sets in graphics [Tufte, 1983, pp.65-67]; 

• a tall (vertical) shape of the plot of monetary spending emphasizes 
rapid growth; 

• a short (horizontal) shape of the same plot suppresses a user’s expec- 
tations of the rapid growth; 

• visual objects located in front of the other objects are perceived as 
emphasized, towered and larger than others; 

• horizontal arrows encourage impression of a stable base; 

• arrows pointing vertically emphasize growth. 

Visualization of incorrect relations. In the previous examples, guidance 
could help to avoid the misperception of relations between objects. How- 
ever, guidance has its limitations — the correct relation R(a,b) should be 
known. Ptolemy’s geocentric solar system, which depicts incorrect relations 
R](Sun, Earth) and R 2 (Sun, Moon), not only appeared on the visualization as 
relation between circles Qi(Sun, Earth) and Q 2 (Sun, Moon) (see Figure 
16(a)) but also presented in the original form of relations Ri(Sun, Earth) and 
R 2 (Sun, Moon). It is important to notice that original relations are also visual. 
Ptolemy believed that he had seen Moon and Sun moving around the Earth 
every day. Thus, he depicted that visual relation in his geocentric system. 
This example shows that our steps 1 and 2 are applicable to original relations 
presented either visually or textually. 

Figure 16 also serves as a side-by-side visual correlation of two models, 
quickly showing their differences (here Mercury and Venus are not named 
but are shown). This type of visual correlation emphasizes only differences. 




(a) (b) 

Figure 16. Ptolemy’s Geocentric world (a) and Copernicus’s Solar system (b) 

Approaches. Several approaches have been generated to meet listed 
challenges [Tufte, 1983]: 

• design different graphics for each perceiver in each context, 

• design graphics that correct in average for many perceivers. 
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• use a table instead of graphics for data sets of 20 numbers or less, 

• represent graphical objects in proportion to numbers, Lie Factor =1, 

• label graphical objects to defeat graphical distortion and ambiguity. 
The last three approaches are applicable only for numeric data, the first one 
requires a description of the context. Of course, identifying average percep- 
tion requires a lot of psychological studies. An alternate approach is to use 
active guidance for the perceiver instead of passively relying on perceiver’s 
choice. 

5.2 A visual correlation model based on intermediate 
objects 

5.2.1 The intermediate objeet eoneept 

If objects A and B are given by their numeric attributes then they can be 
correlated by comparing the value of attributes, computing a measure of 
their closeness, and finally by evaluating that measure. If the measure is high 
enough then A and B are called correlated. This process can then be visual- 
ized in a variety of ways such as those presented in the previous sections. 

Flowever for many tasks, objects A and B are not represented directly by 
their attributes rather relations between A and B that are directly and explic- 
itly recorded in a database. In such cases, correlation may need to be discov- 
ered from indirect data that can be spread in different records or even differ- 
ent databases. For such tasks, one approach to discovering the correlation 
between A and B is done by using an intermediate object B 

This approach can be successful for tasks where discovering some relation 
R] between A and B’ and another relation R 2 between B’ and B would be 
simpler than discovering a single relation R between A and B, R(A,B) di- 
rectly. It is also expected that these two relations Rj and R 2 can be combined 
into a single relation between A and B without significant difficulties. Thus, 
this approach requires the discovery of an intermediate objeet B Note that 
relation Ri between A and B ’ and relation R 2 between B and B ’ can be quite 
different. This approach has a lot in common with link analysis. The 
DARPA HELD program [Senator, 2001] is the most intensive recent attempt 
in this area. 

Example. Objects A and B are two terrorist attacks and the goal is to cor- 
relate them, that is to find what is common between the attacks. The straight 
comparison of attributes may not reveal any correlations useful for decision 
making — preventing new attacks and punishing those who are responsible. 
More specifically, let a set of intermediate objects {B’} be available and say 
that these objects constitute all communication intercepts in countries where 
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attacks A and B were committed. It is possible that one of intercepted mes- 
sages B ’k indicates that a brother of the person X responsible for attack A 
called to the person responsible for the attack B. 

This link permits one to set up a relation between the two attacks. Further 
surveillanee of the brother’s communieation can potentially help prevent 
new attacks. The names of perpetrators involved in A and B can be used to 
search for links/relations in {B ’} . If communication between perpetrators in 
A and B and their relatives is discovered then a correlation between A and B 
is established and can be visualized. A variety of visualizations have been 
developed to support link analysis. This visualization is typically carried out 
using individual attributes, not through the use of complex objects as we de- 
scribe below. 

5.2.2 Dellnitions 

Definition 1. Two objeets A and B from classes A and B are exactly cor- 
related objects if there is a homomorphism between them. 

Informally homomorphism means that relations in A have been matched 
to relations in B with the same properties, that is the stmctures of A and B 
and similar. For more formal definitions see section 5.3 and [Mal’cev, 1973; 
Kovalerchuk & Vityaev, 2000]. 

Definition 2. Two objects A and B are correlated objects if there is object 
B ’ (from class B) exactly correlated to A, where B ’ is produced from B by 
some mapping F and B ’=F(B). 

Definition 3. A funetion x(B,B j is ealled the dijference between B and B ’ 
if t: {B,B ’} — > [0.1] and for every B and B ’ x{B,B) = x{B ’,B j = 0. 

Definition 4. Visual correlation of two correlated objects A and 5 is a pair 
of visualizations VCl and VC2, where VCl is a visualization of the similar- 
ity (homomorphism) between A and B’ and VC2 is a visualization of the 
dijference between B ’ and B. 

Figure 17 illustrates Definition 4. 



exact correlation 



A 



^ Sinnilaritv ^ ^ pifference ^ 

Degree of similarity B' degree of difference 

homomorphism 



Figure 1 7. Visual correlation using intermediate object B ’ 
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5.2.3 The mathematical structures of objects 

Objects A, B and the intermediate object B’ may belong to a variety of 
mathematical structures, such as linear order, lattice, tree, planar graph, di- 
rected graph, general graph structure, and Zade’s linguistic variable struc- 
ture. For detailed definitions see [Birkhoff, 1979; Zadeh, 1977; Kovalerchuk 
& Vityaev, 2000], 

According to the definitions in the previous section, objects B and B’ 
should belong to the same mathematical structure B while object A can be- 
long to another stmcture A. In what follows, we give an example demon- 
strating how a lattice structure A can be correlated to a linear structure B via 
and intermediate linear structure B 

Let 5 and 5 ’ be subsets of linear stmcture Z, 5 c Z, B’czZ, Z=[0,1]. 

Let the lattice A he a subset of the Cartesian product, XxY, A a XxY, 
where X=[0, 1], 7=[0, 1], In other words, A a {{x,y): x&X, ye Y}. Further, 
note that A is lattice; that is, upper and low elements are defined for any pair 
of elements of A by operations a and v. For instance, for (x,y) and (y,x) if 
x<y then (x,x) is a lower element, 

(x,x) = (x,y) A (y,x). 

In a similar way, the upper element is (y,y) = (x,y) v (y,x). In general, upper 
and low elements may not belong to A. 

Let B be mapped to 5 ’ by a homomorphism F, F(B) = B’, that is B ’ may 
contain fewer elements than B. Also let A be mapped to 5 ’ by a mapping 
M(A) = B ’ such that every (x,y) in A is mapped to z by its largest compo- 
nent, z = max(v,y). Thus, both A and B can be mapped to B 

A^B’^B 

by using Mand F: M(A) = B’ = F(B). 

5.3 Algebraic relational approach for defining 
correlation 

In section 5.2.2 we defined the exact correlation of two objects A and B 
from classes A and B based on the homomorphism between them. This gen- 
eralized concept is available for describing the correlation of a variety of 
complex objects and it also permits the description of a classical linear 
correlation as we show below. The concept of classes A and B has not yet 
been formally defined. This was done deliberately because the class can be 
task specific. One of very general concepts of the class derives from abstract 
algebra that provides the concept of a relation structure also known as a 
model and the concept of the algebraic system [Mal’cev, 1973]. Another 
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and the concept of the algebraic system [Mal’cev, 1973]. Another even more 
abstract concept of class can be derived from the mathematics of category 
theory [Marquis, 2004]: 

Category theory now occupies a central position not only in 
contemporary mathematics, but also in theoretical computer science and 
even in mathematical physics. It can roughly be described as a general 
mathematical theory of structures and sytems of structures. 

Below we will define and use the concepts of relational structure (model) 
and algebraic system. 

Definition 5. A pair^=<a, Qa> is a relational structure (model), 
if fl is a set of objects and Oa is a set of relations (predicates) Pi on Cartesian 
products of a, such that 



Pi :flx«x...xa — > {0,1} 



that is for every vector {ai,a2, ...,a„). Pi {ai,02, ...,a„)=0 or P, {ai,a2, ...,a„) =1. 

Definition 6. A pair A=<a, is called an algebraic system, if a is a 
set of objects and Qa is a set of predicates Pi and operators Ft on Cartesian 
products of a , such that F, (a],a2, ...,a„) =a„+i, that is 

Pi :axax...xa ^ {0,1}, F, :axax...xa — > a. 

Thus, a relational systems is a special case of an algebraic system; that is, 
one without operators. Set Oa is called a signature of 

Definition 7. Let us given two algebraic systems A=<a, Qa> and B=<b, 
Qb>, where Qa=<{Fi},{F}> and Qb=<{2;},{C,}> where F„ Qi are predi- 
cates and F, Gi are operators. A mapping (p: a ^ 6, from ^ to F is called a 
homomorphism if for every vector {a 1,02, ...,a„) 

Pi (aj,a2, ...,a„)=Qi {(^(aj,), g>(a2, (p(a„,)) 

(^{Fi{ai,a2,...,a„))=Gi((^(a],), (p{a2, (p(a„,)) 

In classical linear regression suppose J[x) correlates two ordered arrays 
{Xi} and {gi} such that [y,- -J[Xi)\ < Si where value of Si can vary for different 
and bi. In classical correlation analysis, this function / is called a regres- 
sion fnnetion. 

The idea of correlation is that by using/we can judge relations in the {y,} 
by knowing relations in {x,}. For instance, if X 2 <xj then we should be able 
to say thaty 2 <ys with some level of confidence. 
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Correlating algebraic systems A = <a, Qa> and B = <b, Qb> has exactly 
the same goal. We build a function / from aXo b with l/[a,) - h,|<£;, where £; 
is small or equal to zero and where knowing properties in A we use / to judge 
properties in B. In classical correlation, the set of such properties is not for- 
mulated explicitly. Algebraic systems permit the writing such properties ex- 
plicitly. For instance, we may want to be sure that additivity in A is pre- 
served in B\ that is, we may postulate an additive operator F(ai,a2) in A, F: 
axa a, 



F(a i, a 2)=kF(a i)+mF{a2), 

and an additive operator G{ai,a2) in B, G: bxb b, 

G{b,,b2)=sG{b,)+tG{b2). 

If we match ai, 02 and bi, b2 by a correlation function / this function should 
also match elements produced by combining these pairs using F(ai,a 2 ) and 
G{b,,b2): 

AF{a,,a2))=G{Aai),Aa2))=G{b,,b2) 



This is exactly the property that is enforced by homomorphism. Thus, the 
following theorem can be formulated. 

Theorem. If there is a homomorphism cp between A and B then = 0 and 
thus, A and B are exactly correlated (see definition in section 5.2.2). 

If then the correlation mapping differs from the homo- 

morphism (p. In this case, the correlation function/ serves a role of pseudo- 
homomorphism. It is also possible that a homomorphism from AXo B simply 
does not exist, that is any mapping from AXo B will violate some properties 
of A formal definition of pseudo-homomorphism can depend on specific 
the properties of systems A and B. For instance, if the properties of ^ and B 
are only in a predicate true/false form, then we can measure the number of 
violations under mappings / and cp. If A and B contain some metric proper- 
ties, then we can measure how significant the violation is in terms of the dis- 
tance. The actual choice should depend on a particular application. Thus, 
pseudo-homomorphism formalizes a less restricted concept of correlation. 
The general concept of homomorphism and pseudo-homomorphism can be 
applied to a variety of structures with relations and operators. 

Example. Suppose we want to correlate human height and weight using a 
dataset of heights of five people a={ai, 02, 03, 04, as) and a dataset of weights 
of the same five people b={bi, b2, bs, b4, bs) to figure out if there is any 
correlation between weight and height. Let ^ be a relational structure 

A= < a, >, >h , Pa{ai,a2,a3,a4) >, 
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where > indicates that subjects are ordered (indexed) and >h indicates a 
greater than or equal to relation for human height. 

Let B be another relational structure 



B=< b, >^,Pb{hj, b2, bs, b4) >, 



where >w indicates a greater than or equal to relation for human weight. Here 
predicates and Pt are the following: 

P a (.rii) aj, Qbf Gj-aj a^-af^. 

Pbi.bi>bj,bb,byf^ b j-b b ji~b fyi. 

Let (p map elements of H to S as follows: a, h,. We can test if H and B are 
homomorphic. Assume that it is true. We need to understand how this can 
help us in correlating human weight and height. According to homomorph- 
ism, if the difference between weights for two people {i,j) is smaller than for 
other two people (k,m) then the difference between heights for the first pair 
is also smaller. If persons i and k are the same person then we can judge the 
relation between persons j and m relative to person i. We can state that if j is 
taller than m (the difference in height between j and i is greater than between 
m and i), then j is heavier than m (the difference in weight between j and i is 
greater than between m and i). Thus, we are able to correlate weigh and 
height. This also can be converted to a more traditional numeric form, but 
for many non-numeric relations this is a natural form of correlation and 
visualization should be able to represent such relations. Visual correlation 
techniques based on an intermediate element is one potential approach to 
consider for this. 



6. CONCLUSION 

This chapter identified the concept of visual correlation. The challenges in 
visual correlation include how to correlate visually conflicting data and data 
with different levels of resolution and how to make a “rich” visual correla- 
tion for portraying the differences between objects. Visual correlation of ob- 
jects and events has not yet defined itself as a separate field. It has a lot in 
common with visualization, visual data mining, statistics, and the general 
decision-making process. A variety of methods has been developed inde- 
pendently in different fields with little or no communication and without 
common terminology. A significant amount generalization work should be 
done. In this chapter, we provided a review, preliminary structure, classifica- 
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tion and formalizations for visual correlation methods and criteria to assess 
the quality of visual correlation. 
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8. EXERCISES AND PROBLEMS 

1. Discuss differences between concepts of Correlation, Fusion, and 
Semantic Integration defined in Section 1.2. Provide examples for each 
category. 

2. Discuss differences between three levels of correlation (high, medium 
and low) defined in Section 1.3. Provide examples for each category. 

Tip: modify examples presented in Section 2. 

3. Expand Table 4 from Section 3 with more metaphors for visual correla- 
tion tasks. 

4. Provide your own example of relation distortion similar to the one pre- 
sented in Figure 14 in Section 5.1. 

5. Provide an example of visual correlation using intermediate object B’. 
Your example should be consistent with definitions given in Section 
5.2.2-S.2.3 and illustrated in Figure 17. 

Advanced 

6. Try to visualize an algebraic form of the example presented in Section 
5.3. Tip: start from visualization of classical linear regression and visual- 
ize algebraie relations (6) and (7) as a part of this exercise. 
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ICONIC APPROACH FOR ANNOTATING, 
SEARCHING, AND CORRELATING 



Boris Kovalerchuk 

Central Washington University, USA 



Abstract: This chapter presents the current state-of-the-art in iconic descriptive ap- 

proaches to annotating, searching, and correlating that are based on the con- 
cepts of compound and composite icons, the iconic annotation process, and 
iconic queries. Specific iconic languages used for applications such as video 
annotation, military use and text annotation are discussed. Graphical coding 
principals are derived through the consideration of questions such as: How 
much information can a small icon convey? How many attributes can be 
displayed on a small icon either explicitly or implicitly? The chapter also 
summirizes impact of human perception on icon design. 

Key words: iconic representation, compound icon, composite icon, iconic query, iconic 

sentence, graphical coding 



1. INTRODUCTION 

The application of iconic descriptive approaches and languages have 
proven useful for annotating, searching, and correlating traditional databases, 
and those containing images and multimedia [Chang at ah, 1987; 1989, 
1994; 1996; Davis, 1995], In this section, we provide an overview of state- 
of-the-art in this arena. 

1.1 An overview of Media Streams 

We begin this review with a look at Media Steams, a system developed at 
MIT Medial Laboratory [Davis, 1995] for video annotation. This system 
contains about 3000 predefined icons. The icons are organized into a seman- 
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tic hierarchy (ontology). These icons serve as building blocks for ordinary 
compound icons, which are comprised by as many as three icons placed 
side-by-side. Table 1 shows icons that are similar but not identical to those 
used in Media Streams since our goal is only to present Media Streams con- 
ceptually. In essence, a user selects the desired icons from a semantic hierar- 
chy of icons to builds an iconic sentence. Media Streams does not permit 
complex combinations of icons such as superimposing one icon on another 
with possible resizing or color change. Each concept is encoded in an icon 
and more complex concepts such as “two cars’ are encoded using two icons 
“car” and ‘two”. Similarly “three blue birds” are encoded as “bird” and “blue 
three” and “two adult female dentists” are represented by three icons “adult 
female”, “dentist” and “two”. 



Table 1. Examples of ordinary compound icons 



Ordinary compound icon Description 


^ •• 


Two cars 




• 


Three blue birds 




• 




* 


Q 


Two adult female dentists 


f 


W 


• • 



An annotation is defined as a graphical descriptor of a segment of the 
video content, comprised of a compound icon and an adjacent color bar that 
indicated the end and the length of the segment. The color of the bar corre- 
sponds to the colors of the compound icon (see Figure 1). 



Figure 1. Ordinary compound icons “two cars” and ‘three birds: with their time lengths. 

The semantic hierarchy of icons includes characters, objects, screen po- 
sitions, relative positions, character actions, and object actions. Giommed 
Icons are a type of compound icons that combine up to three ordinary com- 
pound icons across dijferent descriptor hierarchies. Table 2 provides an ex- 
ample of a giommed icon. In addition, animated icons can be used to ex- 
press character actions or object actions dynamically. 
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Table 2. Examples of glommed compound icons 



Gloomed compound icon 


Description 


t ^ 


An adult male using his hand to operate a gun 
(black circle in hand indicates an object) 



1.2 The iconic annotation process in Media Stream 

The iconic annotation process in Media Streams consists of three major 
steps: 

• selecting icons, 

• assembling compound icons, and 

• filling lines with selected compound icons. 

The first step includes an analysis of an input shot. Table 3 shows an exam- 
ple illustrating this with the statue of an adult male using his hand to operate 
a gun. 



Table 3. Selecting icons, assembling compound icons and fdling lines 



Selecting icons and 

assembling a Filling annotation lines Icon description 

compound icon 




Location icon line: 

r Earth ground icon 

V outdoor icon 




video input 


M Character line: 

W Adult man icon with 

B attached descrintor “Jon” 

Jon 


Character’s action 

A \\Y|V icon line: 

W V# J Jon using his hand to 

1 * operate a gun 

Jon 



A user can select appropriate icons from a hierarchical menu of icons in GUI 
interface. The first line in Table 3 show a location of the shot with two icons 
“Earth ground” and “outdoor” selected by the user. 
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The second icon specifies the first one and sits below it in the ontology 
hierarchy. The Media Streams time line uses logarithmic time scale to 
shorten description. Filling time lines requires some experience with the sys- 
tem. Usability studies have shown that after two weeks people are comfort- 
able enough to make annotations [Davis, 1995]. 



2. ICONIC QUERIES 



An iconic query system built in [Narayanan & Shaman, 2002] imple- 
ments a restricted subset of SQL commands for querying a database. The 
query language's terms (such as verbs, adjectives) are represented by icons. 
To construct a query, the user composes structured iconic expressions ac- 
cording to the grammar of the iconic language. The language and grammar 
include mles for iconic constructions based on combined icons. 

The goal of such iconic queries is retrieving information from traditional 
text-based databases. Complex query design is often beyond the skills of 
ordinary users and users with disabilities who cannot write. Iconic queries 
can permit these users retrieve information from a database without the aid 
of a programmer. There is also hope that an experienced user can assemble 
and debug iconic queries faster than text-based SQL commands. The same 
reasons apply even more for multimedia databases. 



Table 4. Example of iconic query and answer retrieval 



Visual Query 



Descriptors 



Best matched 



iconic result 





Location: 

outdoor 






f 


Character: adult 
man 




t 


1 


Action: 




f 


man operates 






gun by hand 







t 



t 



Best matched video shots 




shot 2 
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An iconic query is a “native way” of querying multimedia databases. In 
multimedia, iconic queries support searching the space of iconic annotations 
for those icons annotating multimedia that satisfy the search condition. 

Below we illustrate an iconic search process in the Media Streams and 
Bruegel systems. We begin with the Media Streams approach assuming that 
the compound icons shown in Table 4 represent an iconic query to find shots 
in an image database that show “An adult male using his hand to operate a 
gun.” 

The first column in Table 4 shows a sample iconic query: “Find a shot: 
(1) outdoor, (2) an adult male (3) using his hand to operate a gun.” The best- 
matched iconic results shown in second the column, both are identical with 
the query, and the last column shows two best-matched video shots. There is 
a complete match for both shots with this iconic query. Probably a more so- 
phisticated query would differentiate between a real male and a statue. Ta- 
ble 5 shows a sample query without outdoor requirement. 

Table 5. Examples of iconic queries 

Text and iconic query Search results 

An adult male using his hand to 
operate a gun 




Table 6. Complex queries 


Visual query 


Content 




OR query 

Find a compound icon that contains an object OR a 
character 


• OD 1 


AND query 

^ Find a compound icon that contains an object AND a 

1 character 


• © t 


Time overlap query 

1 Find a compound icon that contains the object, which is 

1 temporally-overlapping with a character 
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Media Streams permits complex queries produced by using AND, OR 
and Time-Overlapping operators. Table 6 shows some examples. Media 
Streams also supports queries with exceptions learned from previous retriev- 
als. Table 7 shows a standard iconic query “Find an adult male using his 
hand to operate a gun” OR exception: “Find an adult male using his hand to 
operate a telephone.” Pointer links are used to set up a query with exception. 

The Bruegel system presented in Chapter 10 specifically supports 
iconized records and queries that are designed as an extension of MS Access 
(see Figure 2). Further, this system intends to support both compound and 
composite icons in queries and description of actual records. We discuss the 
Bruegel system and its composite icons in the next section. 



Table 7. Example of query with exception 




X X ^ oa 





Figure 2. Bruegel iconized records and queries 
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3. COMPOSITE ICONS 



Table 8 illustrates the composite icon concept used in Bruegel system. 
Base icons are used to build composite icons by superimposing one base 
icon over another base icon. This is the way the “damaged truck” icon is 
composed. Next this composite icon and a base icon for “uncertainty” are 
combined to produce other composite icons for the concept “Truck damage 
questionable.” There is a limit to how many concepts can be incorporated in 
a single composite icon. We discuss this issue below. 



Table 8. Examples of Bruegel composite icons. See also color plates. 

Base icons Composite icons 




Truck 




Damage 



Uncertainty 



Q 

Damaged truck. 



i ?|l 

Truck damage 
Questionable 



Table 9 illustrates two composite icons that are generated by different se- 
quences of superimposed icons. The meaning of the icon “the key over the 
envelope” can differ from the meaning of the icon “the envelope over the 
key.” The first icon can mean a “secure message” and the second icon can 
mean a “protected security key.” 

Table 9. Composite vs. compound icons 



compound combined 

contents 

icons icons 



IPI 


El 

1 


\ 

Secure message 

/ 




El 

1 


V J 


Message on security 
issue 


X 



We use notation AJB to indicate a composite icon, where icon B is 
posted on top of icon A, and notation BJA is used for the composite icon 
where icon A is posted on top of icon B. Note that the icon posting opera- 
tion J is not commutative. 



BJA ^ AJB 
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One of the difficulties of building a composite icon by superimposing one 
icon over another one is that a part of underlying icon can be obscured. We 
discuss this issue and the formal Bruegel visual language (BVL) below. 

The difference between composite icons (Bruegel) and compound icons 
(Media Streams) is illustrated in Table 9. In Media Streams, “secure mes- 
sage” is iconized by a sequence of two icons. In Bruegel, just one icon is 
used that takes less space. 

Table 10 illustrates how four base icons “message,” “template,” “secu- 
rity,” and “ID” can be combined pair wise to produce 16 composite icons 
with different meanings. 



Table 10. Composite icon generation. See also color plates 





ISI 

messac 


1© 


B 

templat 


\ 

) 

e 


If] 

security 


ID 

s 

ID 


) 




je 


la 


V 

} 




J 






, 




\m 




messac 


— 


— 






E 

templat 


e 




■\ 

J 


B 


\ 

/ 






B 


V 

) 


[f 

securih 


) 


l0B' 


V 

/ 


** * 


/ 


\r 




s 

y 


V ) 


\ 

) 


c 

ID 

>- 

ID 




r 


\ 

/ 




/ 


0 


/ 


ID 









L 



Figure 3 shows a fragment of the iconic language used in Microsoft Win- 
dows applications. These icons illustrate the current practice of posting 
iconic elements (iconels) over background icons. In Figure 3, iconels for 
different aspects of communication are posted on the base icon with two 
human profiles that represents a communication icon. 
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# 






4 


Request 

information 


Request result 
unknown 


Response 

positive 


Response 

negative 


Request 

canceled 



Figure 3. Posting aspect iconels 



The “shortcut” iconel (shown in Figure 4) posted over variety of base 
icons (appointments, contacts, journal entry, message, note, office document, 
and task) gives another example of posting iconels. 



ja 






riiZ] 


0 


New ConlacI New Journal 
Entry 


New Message 


New Note 


New Office New Task 

Document 


“shortcut” iconel 



Figure 4. Posting “shortcut” iconels 



A thorough analysis of the actual use of icons in Window’s Office appli- 
cations reveals that it is not very consistent in posting iconels. This conse- 
quently puts a higher load on a user’s memory than it would if consistent 
options were used. 



4. MILITARY ICONIC LANGUAGE 



The Military Standard 25-25 [http://symbology.disa.mil/symbol/mil- 
std.htm] considers an icon to be the innermost part of a (tactical) symbol 
that provides a graphic representation of a war fighting object (see Figure 5). 
Below we will call this standard Mil 25-25. A tactical symbol contains more 
textual information than symbols in pure iconic languages. All right and left 
fields of a tactical symbol are textual. Thus, it would be more correct to call 
it a mixed textual iconic language. 





(a) Structure of a tactical symbol (b) Example 

Figure 5. Icon concept of Mil 25-25 [http://symbology.disa.mil/symbol/mil-std.htm] 
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Table 1 1 contains a summary of an upper level ontology of this language. 
War fighting objects have two major categories: equipment and military 
units. An object is characterized by its (1) attributes, (2) location, (3) time, 
(4) evaluation ratings and (5) actions. Some of these categories are highly 
elaborated having lower level ontologies that are described by more than 500 
pages in Mil 25-25. 

This ontology has two major differences from other iconic languages: (1) 
a highly elaborated hierarchy of the icons for objects and (2) much less 
elaborated icons for actions. Only movement, direction, and speed can be 
represented in the tactical symbol directly. 

The language was designed to be used for tactical needs. A fighter’s ac- 
tion in the battlefield is often derivable from its name - fighting. That is, a 
direct icon for such action can often be omitted. Similarly for visual correla- 
tion tasks, when only the result of an action is to be shown, direct use of 
icons for these actions can be avoided. 

The example in Figure 5(b) represents a hostile fighter (ID AJ2455) mov- 
ing in the air to the southeast. A hostile indicator is presented by the red 
color and duplicated by the use of the “hostile” shape in case the icon is used 
in a black and white presentation. A fixed wing fighter indicator is repre- 
sented by letter “F” and “in the air” is represented by the shape with absence 
of the bottom frame. The southeastern movement is encoded by the arrow. 



Table 11. Upper level ontology of the tactical symbology language 



Concept 


Representation 


Description 


Object 

category 


Text and icon 


Affiliation (friend, neutral, hostile...) 

Type: Equipment, Military units (infantry, motorized, 
reconnaissance, airborne, outpost, etc) Installation, e.g., 
military base 


Object 

quantity/size 


Text and icon 


Equipment quantity, echelon (military unit size indica- 
tor) 


Object ID 


Text 


Object unique ID 


Combat 

effectiveness 


Text 


Completely effective/capable, almost fully effec- 
tive/capable, fairly effective/ capable, effectiveness can- 
not be judged, effectiveness doubtful, ineffective. 


Location 


Text and icon 


Battle dimension (air, space, ground, sea surface...) 

altitude/depth; 

degree, minute, and seconds 

Status: current, anticipated/planned (indicated by frame) 


Action: 


Text and icon 


Movement: speed, direction 


Date/time 


Text 




Evaluation 

ratings 


Text 


reliability rating: completely reliable, usually reliable, 
not usually reliable. . . 

credibility rating: confirmed by other sources, probably 
true, improbable... 
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The major element of Mil 25-25 iconic language is the shape. Shapes are 
used to encode affiliations (friend, hostile, neutral, ...). In addition, shape 
information is duplicated through the use of color (for color blind people, 
black and white monitors, and drawing). The shape encodes not only affilia- 
tion but two more characteristics: battle dimension (air, space, ground, ...) 
and type (units, equipment, installations, ...). Figure 6 shows the shape for 
hostile air track equipment along with the icon hierarchy. The whole set of 




Figure 6. Fragment of the air track iconic hierarchy 



In the Mil 25-25 standard, there is no independent fixed iconic element to 
represent location categories such as ground, air, space, sea surface, and sub- 
merged. These graphics are context dependent. Other characteristics such 
as object’s affiliation (friend, hostile, unknown, ...) and type (unit, equip- 
ment, installation, ...) impact location graphics. This is a significant dif- 
ference between these context-dependent icons and the context independent 
icons designed in Media Streams. 

In Mil 25-25 global locations such as ground, air, and space are indicated 
by the icon frame. There is no bottom frame for air and space (see Figure 6 
and Table 12). There is no upper frame for subsurface and there is a full 
frame for ground and sea surface. 

Table 12 illustrates the difference between two iconic languages. The 
Media Streams iconic language sets up an individual icon for each concept 
“in air” and “plane” and has no icons for affiliation. Mil 25-25 is more com- 
pact it encodes three characteristics in a single icon, but it is a more complex 
language to learn. Also it has very elaborated set of icons for different types 
of planes (fighter, bombers, drone, . . .) as shown in Figure 6. 
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Table 12. Comparison of syntax of Mil 25-25 and Media Streams 



Icons 


Content 


Iconic language system 


composite icon 


Affiliation: Hostile (shape and red) 
Object: plane 

Location: in air (no bottom frame) 


Mil 25-25 



Object: Plane 
Location: in air 



Media Streams 



compound icon 




Composite icon 
“motorized 
infantry” 



+ L 

object modifier Mil 25-25 

“infantry” “motorized” 




The Bruegel system incorporated the idea of utilizing shapes of icon 
frames from Mil 25-25 as a base for neutral objects. Note that Mil 25-25 
uses only a few colors. These colors represent basic affiliation (thread) of 
war fighting objects: (1) yellow for unknown, (2) blue for friend, (3) green 
for neutral, and (4) red for hostile object. Two more colors (purple and 
brown) are used in the meteorological part of Mil 25-25. This is probably 
sufficient for tactical battlefield symbols on the map, but for more general 
database visualization and visual correlation of objects and events (0/E), the 
use of more colors can be beneficial. Use of similar colors for 0/Es with 
similar attributes can help to facilitate visual correlation. 

Next, Mil 25-25 uses a significant number of textual indicators, e.g., 
“FIRE”. Such text must be read and mentally matched with a real-world ob- 
ject. It takes time. In contrast, an intuitive fire icon (used in the ISO standard 
for flammable materials) appeals directly to the real-world object — fire. In 
addition, it is not easy to combine icons from Mil 25-25 into a composite 
icon such as truck fire and building fire. It may require a repositioning and 
resizing the text "FIRE" as well as the icons for truck and building as sug- 
gested in Mil 25-25. Without resizing and repositioning, icons for fire and 
vehicle will overlap and one of them will not be visible. 

The Mil 25-25 language also uses a relatively small number of basic 
shapes. As we already mentioned shapes represent combinations of 

• location (above surface, ground surface, sea surface, under surface, and 
unknown type of location), 

• type of object (military unit, equipment, installation), and 
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• basic ajfiliation/battle dimension — the threat posed by the war fighting 
object (friend, unknown, neutral and hostile). Recall also the colors for 
affiliations described above. 

These shapes are relatively simple (square, rectangular, rectangular over 
rectangular, “diamond,” “cloud,” “arrow,” “parabola”) and often intuitive for 
users. A variety of modifiers is applied to these base icons (see Figures 5, 6 
and Table 13). 

The standard uses a variety of textual and graphical modifiers are to ex- 
tend the number of alternatives covered. A modifier is defined as optional 
text or graphics that provides additional information about a symbol or tacti- 
cal graphic. For instance, to include more threat affiliations (assumed friend, 
pending, joker, and faker) alphabetical symbols - “J,” “K,” and “?” are at- 
tached to icons. The Mil 25-25 also includes an ontology for military opera- 
tions other than war. Even a brief analysis of these icons shows that this on- 
tology is very limited for visualizing such events as terrorist attacks. This 
limitation of Mil 25-25 explains our intention to develop an iconic language 
that will permit the visual correlation and analysis of complex events such as 
terrorist attacks (see Chapter 1 0 for more detail). 



Table 13. Graphical modifiers 



Characteristics 


Iconic indicator 


Direction 


Arrow 


Equipment 


Squared dots, wave, 


Feint or dummy 


Dash lines 


Task force 


Rectangle 


Headquarters staff 


Flag 


Installation 


Small filled rectangle 


Echelon (team/crew, . . . army. . . ) 


1, II, III, X, XX, XXX, ..., xxxxxx, ... 



5. ICONIC REPRESENTATIONS AS TRANSLATION 
INVARIANTS 

Sophisticated machine translation typically requires that the deep content 
of a sentence be available. Below we illustrate how an iconic representation 
can encode this deep content. The icon shown in Figure 7 aimotates the 
statement “A robber runs out quickly with a bag.” 
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Figure 7. Icon “A robber runs out quickly with a bag.” 

A straight word-for-word Russian translation is a meaningful sentence 
“Grabitef ubegaet bystro s meshkom.” Now let us consider the Russian sen- 
tence “Ubegaet grabitef na vseh parah s sumoi za pravym plechom.” An 
English word-for-word translation will prodice the sentence “Runs out a 
robber on all pares with a bag agree right by shoulder,” which is nonsence. 
The deep linguistic content cannot be translated word-for-word. Russian 
permits a free word order and this sentence starts with the verb. Next, the 
sentence contains two phrases “na vseh parah” and “s sumou za plechom.” 
In addition, the word “za” has three English meanings: “behind,” “agree,” 
and “instead of’ depending on context. Now assume that this Russian text is 
augmented with the icon shown in Figure 7. In this case, an Engish speaker 
almost does not need a translation thereby avoiding the misleading word-for- 
word translation or the significant effort associated with sophisticated trans- 
lation. This is especially true when templates are used in both languages in- 
stead of free text. This idea was behind Blisssymbology designed by Bliss in 
1940s [Bliss, 1978, 2000] but is still far from being fully utilized. 



6. GRAPHICAL CODING PRINCIPLES 



6.1 How much information can a small icon convey? 

Below we analyze how much information can be reasonably conveyed in 
a single small icon. This is a critical issue for success of the entire enterprise 
of iconic language design. Specifically we are interested in clarifying an- 
swers to the questions: 

(Ql) How many attributes can be explicitly encoded in a small icon? 

(Q2) How many attributes can be implicitly encoded in a small icon? 

We start with question Ql for the 32 x 32 pixels icons that are widely used 
in software design and provide us with empirical material for study. Figure 
8(a) shows the icons “find,” “find and replace,” and “find again” from Bor- 
land C++ software. The first icon encodes one concept, “find,” using a 
flashlight metaphor. The second icon encodes graphically two concepts: the 
first icon’s concept and “replace” by adding the text “A^B.” Similarly the 
third icon encodes two concepts: the first icon’s concept and “again” by add- 
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ing an ellipsis Figures 8(b) and 8(c) provide other examples of small 

icons that encode 1-2 attributes each. For instance, Figure 8(b) shows an 
“iconic sentence” that describes ways to manipulate the compiler and linker. 
This sentence differs structurally from traditional sentences in natural lan- 
guages, but it is still the description of a complex object. All three parts of 
Figure 8 keep meaningful metaphors. 













• • • 

— 



(a) Borland C++ icons “find”; “find and replace”, and “find again” 




(b) Microsoft Visual C++, build, rebuild, stop to build. 

'd ft) 

(c) Insert/remove a breakpoint; insert/ remove all breakpoints; 
enable/ disable a breakpoint; disable all breakpoints. 

Figure 8. Icon analysis 

Figure 9(a) shows Microsoft Visual C++ icons that accommodate two 
concepts by using a combination of graphics and text that includes a tool 
icon and a numeric index. Flere a part of the metaphoric component is lost. 
The number identifies a specific tool (e.g., tool 6) and does not convey di- 
rectly that the tool is spy++. 



Tools Window Help 

Visual Component Manager 
A Register Control 
A Etto' Lookup 
Ai Activ^ Control T est Container 
A OLE /COM Object Viewer 

At, SfiV++ 

A MFCIracer 



(a) 



\£\i & & 

lOOIOI lOOIOI lOOIOI 



(b) 



Figure 9. Combination of graphics and text 



Figure 9(b) shows how three Borland C++ icons “compile unit,” “make 
project,” and “build project” convey more concepts — three attributes are 
encoded in each icon. The last icon combines attributes of icons 1 and 2 tak- 
ing symbol “!” from the first icon and yellow folder symbol from the second 
icon. The first one is the class of operations for the “steps of producing ex- 
ecutable (binary) code” (binary sequence “100101”). In addition, each icon 
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conveys two specific attributes. The first icon conveys explicitly the attrib- 
utes: (1) compile (!) and (2) unit (page and blue color); the second icon con- 
veys two attributes: (3) make (“?”) and (4) project (folder and yellow color); 
the last icon conveys attributes (4) “projecf’ (folder) and attribute (5) “build” 
(“!”). Together these three icons form an “iconic sentence” that depicts the 
complex object “Borland compiler and linker”. The use of color permits 
immediate recognition that the second and third operations deal with one 
entity and the first icon deals with a different entity. As you can see, the pro- 
ject concept is encoded by two graphical features (a folder and the color yel- 
low). Similarly, the unit concept is encoded by two other graphical features 
(a page and the color blue). Thus, there is a redundancy in this encoding - 
features are doubled graphically resulting in clearer and more quickly distin- 
guishable icons. These icons also permit one to see that the first icon indi- 
cates an action that produces executable code in one step. In contrast, the 
same result can be produced by using icons 2 and 3 together (the symbol 
“!”). 

Each icon presents three concepts using four graphical features. We can 
conclude that a simple 32 x 32 pixels icon is able to encode 3-4 independ- 
ent eoneepts without causing any perception difficulties. 

Generally, an iconic sentence that contains three icons can depict 9-12 
concepts explicitly and can also represent many relationships between the 
icons implicitly, such as: (i) to represent the same object or different objects 
(encoded by color and shape) and (ii) to represent a subset of operations 
(symbols “?” and “!”). Table 14 shows another example depicting 3-4 icons 
using iconic metaphors. The analysis above shows that a typical software 
icon explicitly presents 1-4 concepts per icon. 

Table 14. Examples of Borland’s iconic convention 

Icon Vendor Product type Product Platform 

subtype 




Borland 
(background 
color pattern) 



Debugger 
(bug iconels) 



Win32 
(iconel 32) 




Borland 
(background 
color pattern) 



Debugger 
(bug iconels) 



Installation 
program 
(disks iconel) 



Win32 
(iconel 32) 
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6.2 How many attributes can be implicitly encoded by a 
small icon? 

At first glance, the observations above permit one to conclude that four 
independent concepts should be a practical maximum for the number of ele- 
ments depicted in a small icon. This would then justify a compression ratio 
(from text to icons) of at most 4:1. That is, four icons (each depicts only one 
concept) can be combined into one small icon that would represent four text 
concepts. 

The number of concepts implicitly encoded in an icon can be much lar- 
ger than three or four concepts. This is done by relaxing the requirement of a 
one-to-one match between an attribute and an iconic metaphor. Below we 
describe an example from a Singapore executive job service company 
[http://www.liahona.com.sg/]. 

This company provided a long description of (1) the benefits that their 
client companies may provide to an employee, (2) application and resume 
requirements, and (3) expatriate bonuses. Each of these areas is represented 
by a single icon described in Table 15. The icons we use in table differ from 
original icons but encapsulate similar information. 

Table 15. Example of complex concepts encoded in icons 
Icon Description [http://www.liahona.com.sg/icons+symbols.htm] 



A 

sizable 

benefits 



The company provides at least half of these benefits: medical, dental, accident 
insurance, low interest loans for car & housing, education assistance, transport 
allowance, technical & development training, holiday subsidy plan, recrea- 
tional facilities, annual company function, compassionate, marriage, maternity, 
paternity, childcare, examination leave, stock options purchase plan, profit 
sharing, etc 




The company has a comprehensive expatriate package that includes most of 
these benefits higher basis salary, overseas premium, housing allowance, cost 
of living allowance, home leave, children’s education,, spouse make-up salary 
compensation, company car, tax equalization, hospitalization & medical insur- 
ance, etc. 



Interested candidates are to apply with a detailed resume stating work experi- 
ence, educational qualifications, full personal particulars of current & expected 
salary, starting data or resignation notice required, contact numbers (during & 
after office hours), address, age, nationality, marital status, language ability 
and driving license, a photograph and supporting documents. 
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It is hard to measure how many concepts are really depicted in the first 
three icons in Table 15, but obviously it is more than four concepts or attrib- 
utes. It shows that a one-to-one mapping between text concepts and icon 
features is avoidable although a user may need assistance in learning an im- 
plicit iconic language without a steep learning curve. Table 16 presents an 
example of two iconic sentences that summarize two open positions. These 
iconic sentences are obviously shorter then original text and can be com- 
pared and correlated faster. 



Table 16. Iconic sentences that summarize open positions 



Due 

Date 


Trans- 

port 


Phone 

Fax 


Pay package 


Bene- Expat- 

fits riate 


Career 

growth 


Resume 

require- 




nearby 






package 


ments 
























6FP 


A ^ 






— 




June 1 


10 min 


781- 


6-figure pay. 


large 








walk 


1430 


>$100,000 




large 








$$ 


A ^ 










July 9 


10 min 
walk 


980- 

1036 


depends on 
experience 


modest modest 


modest 



7. PERCEPTION AND OPTIMAL NUMBER OE 

GRAPHICAL ELEMENTS 

7.1 Perception and icon design 



Context plays an important role in icon perception and should be re- 
flected in icon design. Above we described the context dependence of mili- 
tary icons. Icon context has a variety of aspects. For instance, objects drawn 
nearby can change the meaning of the icon. General gestalt laws of percep- 
tual organization [Preece, 1994] set up a perceptual framework for icon de- 
sign: 

• regions bounded by symmetrical borders tend to be perceived as 
coherent figures (see Figure 8(b)), 

• elements of the same shape or color to tend to be seen as belonging 
together, and 

• the boundary contrast is better than a linear boundary for making a 
shape stand out. 
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Another way to take into account a perceptual aspect in icon design is us- 
ing the ecological approach [Gibson, 1979; Preece, 1994] ~ to help a user 
to simply detect information rather than to construct information from the 
image. Detection is a single step process, but constructing may take two or 
more steps. 

For instance, a user needs to analyze information on a victim. If victim’s 
information is in two different spots (see Figure 10(a)) then the user needs to 
assemble/construct this information before analyzing it. 

In contrast. Figure 10(b) provides victim’s information already assem- 
bled as a single focus entity in the center of the window. The Bruegel iconic 
system supports the icon and an icon elements relocation mechanism to be 
viewed correlated. 





Figure 10. Examples of Gibson’s ecological approach 



7.2 The optimal number of graphical elements 



Table 17 shows a summary of the maximum number of effective codes 
(variations) of different graphical elements that can be used in visualization 
objects and actions based on [Preece, 1994]. Icon design would benefit from 
following the principles associated with these results. 

The encoding principles described in Table 17 are used in many visuali- 
zation applications including flowchart design. The basic idea is to represent 
objects and actions (see Figure 11) where the total number of object and/or 
action types encoded is determined in accordance with limits shown in Table 
17. 
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Table 1 7. Principles of graphical coding in a single application based on [Preece, 1994] 



Type of 
entity 


Graphical 

element 


Examples of entity 


Maximum 
number of 
effective 
codes 


Perception 

comparison 


Any 


Alpha- 

numeric 


12”, AK-47, high tem- 
perature, 35° 


Practically 

unlimited 

(self- 

evident 

meaning) 


Words are scanned 
longer than letters. 
Letters are scanned 
longer than digits. 


Object 


Abstract 

shape 


Document, data, disk, 
ground, in air, in sea, 
target, victim. 


10-20 


Scanned longer than 
color 




Color 


Red -hostile, warning 
sign; blue — friend, 
green - neutral. 


4-11 


Scanned longer than 
digits, but faster than 
letters 


Action/ 

operation 


Abstract 

shape 


Flowchart symbols: sort, 
delay, collate, decision, 
merge, manual operation. 


10-20 


Scanned longer than 
color 


Direction 


Line & 
angle 


Wind direction, 
attack direction 


8-11 




Numeric 

attribute 


Line length 


Percentage, temperature, 
confidence, size 


3-4 






Line width 


Percentage, temperature, 
confidence, size. 


2-3 






Line style 
and fill 




5-9 




Relation 

between 

attributes 


Ratio of 

length and 
width 


Correlation between per- 
son’s weight and height. 


3-5 


Scanned longer than 
shape and color. 



/□ P 

data document multidocument 

a) Shapes for objects 



U 

Manual input decision Manual operation 
b) Shapes for operations 




Figure 11. Example of contrasting shapes for objects and actions 
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8. CONCLUSION 

This chapter described the state-of-the-art in iconic descriptive ap- 
proaches. These approaches are useful for annotating, searching and correlat- 
ing traditional databases and those containing images and multimedia. They 
are compact and can provide a quick response since they are psychologically 
appealing for users. 

Iconic annotations are based on concepts of base, compound and 
composite icons used to construct iconic sentences. For the iconic annotation 
process, grammars and ontologies were described using three representative 
systems. The first system considered was the MIT Media Streams system for 
annotating video. Another important iconic annotation system considered is 
known as military standard Mil 25-25, which provides a combination of 
icons, abbreviations, and short text to represent war fighting objects. 

The third system considered was the Bruegel system for annotating text. 
Mil 25-25 and Braegel are interesting from a conceptual viewpoint because 
both use context-dependent visual grammars to represent war fighting ob- 
jects and text while Media Streams uses a context-free visual grammar to 
represent video content. 

The concept of composite icon is employed in both Mil 25-25 and the 
Bruegel system. Composite icons are a natural way to introduce context to 
icon design and produce compact and informative icons. Language transla- 
tion is another area that can benefit from the use of modem iconic annota- 
tions. This idea was behind of the development of the first such system, 
Blisssymbology, designed by Bliss in 1940s. 

An iconic query is a “native way” of querying multimedia databases. In 
multimedia, iconic queries support the ability to search the space of iconic 
annotations for those icons annotating multimedia which satisfy a given 
search condition. 

An analysis of the practical use of small icons in a software graphical 
user interface demonstrated that simple 32 x 32 pixels icons are able to en- 
code 3-4 independent eoneepts explicitly without any perceptual difficul- 
ties. We also provided an example where each icon conveys more than a 
dozen concepts implicitly. 

The chapter concluded with a summary concerning the maximum num- 
ber of effective variations of different graphical elements that can be used in 
visualization and icon design. 
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10. EXERCISES AND PROBLEMS 



1. Design ten base icons and construct four compound icons using them. 
Build four iconic queries using the ten initial icons and four compound 
icons. Each query should contain at least four icons. 

2. Use the icons designed in exercise 1 and constmct four composite icons. 
Build four iconic queries using the ten initial icons and four composite 
icons. Each query should contain at least four icons. 

3. Build four composite icons that will encode six attributes each explicitly. 

4. Build four composite icons that will encode at least ten attributes implic- 
itly. 

5. Construct five iconic sentences that will summarize a one-page text of 
your choice. 
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Abstract: This chapter addresses the problem of visually correlating objects and events. 

A new Bruegel visual correlation system based on an iconographic language 
that permits a compact infomiation representation is described. The descrip- 
tion includes the Bruegel concept, functionality, the ability to compress infor- 
mation via iconic semantic zooming, and dynamic iconic sentences. The chap- 
ter provides a brief description of Bruegel architecture and tools. The formal 
Bruegel iconic language for automatic icon generation is outlined. The second 
part of the chapter is devoted to case studies that describe how Bruegel iconic 
architecture can be used for the visual correlation of terrorist events, for file 
system navigation, for the visual correlation of drug traffic and other criminal 
records, for the visual correlation of real estate and job markets offerings, and 
for the visual correlation of medical research, diagnosis, and treatment. 

Key words: Visual correlation, iconographic language, semantic zooming, database visu- 

alization, iconic representation 




1. INTRODUCTION 

The Bruegel visual correlation system permits the compact visual annota- 
tion of information for objects and events along with their rapid comparison 
and correlation, search and summary presentation. The system was named 
after Flemish painter Pieter Bruegel and was inspired by his famous painting 
“Blue cloak” shown partially above (see Figure 1 and Table 1 in Chapters 8 
for more detail). The main categories of possible visual correlation systems 
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are described in Chapter 9. The Bruegel iconic system supports three catego- 
ries. To begin with, it is a spreadsheet category where each object and event 
(0/E) is represented as a spreadsheet of icons organized in rows. The second 
category is a 3-D tree presentation where each 0/E is represented as a 3-D 
tree of icons located in nodes. Icons located at the terminal nodes convey 
most detailed information about 0/E while upper-level nodes convey more 
generalized information. The third representation is a “planted” 3-D tree 
representation that combines a geographic map or image with a 3-D tree 
“planted” at the location of 0/E on the map/image. Each of these representa- 
tions has its own semantic zoomed form which is conceptually described in 
section 2.2. This form permits ""semantic compression” of icons to get a 
more compact representation. 

The Bruegel visual correlation system includes several components: 

• the Bruegel graphieal language (BGL) that specifies the layering of 
iconographic elements into complex icons to represent textual con- 
tent in a space efficient manner, 

• Dynieo, a supporting tool for BGL which aids in the generation of 
complex icons, iconic sentences, and spreadsheets of icons dynami- 
cally 

• 3DGravTree, a supporting tool for BGL to generate complex 3-D 
trees including “planted” trees of icons dynamically, and 

• other graphical tools to support BGL for the visual correlation of 
complex objects and events. 

2. THE MAIN CONCEPTS OF THE BRUEGEL 

ICONIC SYSTEM 

2.1 Bruegel functionality 

The dynamic iconic visual correlation system Bruegel enables users to 
create multi-layered, iconic annotations of events such as medical patient 
records (e.g. breast cancer patients), job advertisement, real estate, file sys- 
tem, criminal records, and terrorist attack records along with visual correla- 
tion of these events. Further, Bruegel provides a compact iconographic vis- 
ual representation and correlation of information that extends military sym- 
bology standard (Mil Std 25-25). Currently the system contains an extension 
of Mil Std 25-25 intelligence symbology library for terrorist activities. 

Experiments with a database on terrorist attacks in 1980s had shown that 
iconized data occupies 10 times less space than text. This means that an ana- 
lysts and decision makers can potentially spend 10 times less time browsing 
and analyzing iconized data. 
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The functionality of the Bruegel Visual Correlation system includes: 

• Facilities to create new icon libraries, 

• Facilities to switch icon libraries for specific applications, 

• Facilities to browse icon libraries, 

• Multiple ways to represents and correlate objects and events as icon 
strings and icon trees with different levels of semantic resolution 
(macro view, middle-level view (list view), network view, clustering 
view), 

• Facilities to correlate and sort icons strings and trees (sentences) in 2- 
D and 3-D trees, 

• Facilities to assign weights to attributes for visual correlation; 

• Facilities to merge icons, and 

• Facilities for the 3-D correlation of iconized objects and events using 
flexible “gravitation” trees. 

The system also aids in the navigation of databases, in searching records 
visually, in correlating records, in discovering useful patterns in databases, 
and in supporting decision-making. Figure 1 demonstrates how an analyst 
can utilize the system functionality. 



Represent “rich” objects 




Portray visually 




Actively correlate 


and events (O/E) 


differences & similarities 


objects and events 


visually 




between O/E 




visually 



tl 

Use system facilities 
•To select textual /XML databases 
•To select reference data base (e.g., 
fragment of World fact book); 

•To annotate/tag text 
•To select an icon library; 

•To browse an icon library, 

•To create a new and edit an icon 
library 

•To restmcture and compress 0/E 
visually. 



Use system facilities 
•Macro view 
•List viewiew 
•3-D view 

to display 0/E in multiple iconic 
ways with different levels of 
semantic resolution and zooming. 
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Use system facilities 
•Visual query 
•List viewiew 
•3-D view 

to observe differences and 
similarities between 0/E with 
different levels of semantic 
resolution. 



•Q- 



Use system facilities 
•Macro view 
•List viewiew 
•3-D view 
•network view 

•clustering view (dendrogram) 
•analytic task view 
to actively correlate O/E visually 



Future system facilities 

•Spiral Visual Correlation 
•Grouped macro view spiral 
correlation. 

•Iconictreemap 
•Iconic Table Lens 
(combination of macro and list 
views) 

•Effect visual slider 
•Pareto view, visual reasoning 



Figure 1. Sequence of analyst’s work with the Bruegel system with supporting facilities 

2.2 Dynamic iconic sentence compression 

Iconic sentences have a significant potential to encode information effi- 
ciently. The number of icons in the sentence and their complexity are major 
factors that limit human abilities for making sense of the encoded informa- 
tion quickly. 

Theoretically, all information can be compressed into a single icon. An 
example is “Blue cloak” painting by Peter Bruegel that compresses 78 Flem- 






234 



Chapter 1 0 



ish proverbs (see Figure 1 and Table 1 in Chapters 8 for more detail). This 
unique visual language provides the highest level of compression, but it is 
not well structured for analysis and visual correlation. Other iconic lan- 
guages have a one-to-one mapping with text sentences. Consider for exam- 
ple Bliss’s iconic language [Bliss for Windows, 2001] a sample of which can 
be found in Figure 4(a) below. In such a language, practically every text 
word is converted to an icon. Thus, an iconic sentence can take even more 
space than the corresponding text. Still the Bliss language fulfills its purpose 
- a global international communication tool - a visual Esperanto; however, it 
does not serve as a language for visual correlation. 

Iconic languages such as Mil Std 25-25 and Media Streams described in 
Chapter 9 represent intermediate steps between “Blue cloak” and Bliss. Fig- 
ure 2 depicts the relative compression provided by these languages. 

-A 1 1 !-► 

Bliss Media Streams Mil Std 25-25 “Blue cloak” 

Figure 2. Levels of compression in iconic languages 

Different goals dictate different levels of compression. Thus, more and 
more one finds iconic languages being designed for different goals. A better 
approach is to build an iconic language that can dynamically change the 
level of compression of an iconic sentence depending of the goal. Below we 
outline the design of such a language. It has been partially implemented in 
the form of our Bruegel iconic language. 

Figure 3 shows how the concept of compression can be applied to iconic 
languages. It includes the concept of spatial compression as well as that of 
semantic compression. 




10 icons 



Figure 3. Dynamics of compression of iconic sentence. See also color plates. 
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Iconic compression can be lossless or it can permit some loss of informa- 
tion in a manner similar to the compression of a jpeg image compression. 
The difference is that we work on high-level compression. In contrast, jpeg 
compression is a low-level (pixel level) compression. 

The bottom level on Figure 3 shows an iconic sentence that matches in a 
one-to-one manner icons to database fields for an individual record. This 
sentence has not been compressed yet. Indeed, this can also be a nearly one- 
to-one matching of icons to words in a natural language sentence. On the 
next level, each pair of low-level icons is combined to a single icon called 
combined icon. Figure 3 displays parent icons in the same color as their 
children since they represent the same information. This process can be re- 
peated until the iconic sentence shrinks to a single icon. 

An iconic sentence viewer would allow iconic sentences to be viewed at 
different levels of compression. A user can select the level of compression 
that best fits the specific analytical task the user is working on. This dynamic 
flexibility is not available in typical iconic languages. 

We use the notation “t-i compression” to denote the compression ratio 
that occurs when text is substituted by an icon. Similarly, we will use the 
notation “i-i compression” to denote icon-to-icon compression, where sev- 
eral icons are compressed into a single icon. The compression ratio is meas- 
ured by the ratio of the space occupied on the screen by the text and the 
icons. 

Our experiments have shown that at the first stage, the Bruegel iconic 
visualization system reached a t-i compression ratio of two when all 24 at- 
tributes (slots) are mapped into 24 icons. Following this, the higher-level 
icon-to-icon compression mechanisms were applied as depicted in Figure 3. 
The result, these 24 icons were compressed into 6 icons yielding an i-i com- 
pression ratio of four and total t-i compression ratio of eight when two com- 
pressions are combined. 

Gradual compression has several advantages for human perception by 
allowing the discovery of relationships between different levels of detail. 
The mechanism shown in Figure 3 has been developed to meet this chal- 
lenge. The mechanism is known as gradual compression, which basically 
says that two icons can be compressed into one icon or some icons with im- 
portant information can stay uncompressed. Gradual compression provides a 
smoother transition by first moving from 24 icons to 12 icons and then to six 
icons. In both cases, the ratio is appropriate for human perception. 

This compression process can also be animated. Again, the user can se- 
lect the level most appropriate for a specific task. We note that these levels 
are not predefined. It is a dynamic and user-controlled process. One of the 
objections typically raised against iconic and other visual languages is that a 
user needs to learn a new language. 
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To make this process easier, the Bruegel system can be adjusted to be a 
learning game that uses icons and text. The user can play with existing texts 
that already have a meaningful compression via the hierarchy of icons. The 
user needs to guess the meaning of the icons. Scores for a guess are calcu- 
lated from the difference between that guess and the real text content en- 
coded in the icons. The computer can play against the user showing only part 
of the icons. It also can be a network game with another person. 

As final note in this section, the described compression mechanism sup- 
ports semantic zooming. Semantic zooming differs from standard zooming 
in two ways. Standard zooming is completely pixel-based while semantic 
zooming provides an upper level understanding of the image content. 

2.3 Iconic annotation for dynamic objects and events 

Objects and events that change over the time and space are a special chal- 
lenge to the iconic approach. Such dynamic objects and events can be repre- 
sented using several approaches [Chang, Bottoni, Costabile, Levialdi & 
Mussio, 1998, 2002]. Table 1 contains a brief description of these three ap- 
proaches (spatial-temporal, semantic explicit and semantic implicit). 

Table 1. Approaches for representation of dynamic events (based on [Davis, 1995]) 
Approach Description 



Spatial- The normalization of temporal events by indexing temporal and spatial changes 
temporal using some temporal and spatial scales and reference points 



Semantic Fixed semantics: semantically relevant atomic units organized into various tern- 
explicit poral patterns (repeated cycles, scripts, etc.) 



Semantic Unfixed semantics: a class of possible semantics is identified implicitly by a 
implicit physically-based description. A physical action is mapped to a set of possible 
semantics in concrete contexts. For an example see [Davis, 1995]: Physical de- 
scription: two people shaking hands. 

Semantics 1 :'greeting' if positioned at the beginning of a business meeting shown 
in the movie shot. 

Semantics 2:‘agreeing’ if positioned at the end of the same meeting (movie 
shot). 



The Media Streams iconic system [Davis, 1995] uses the semantic im- 
plicit approach because its goal is to search for a video segment with a simi- 
lar physical event. The Bruegel system follows all three approaches because 
when correlating events such as terrorist attacks each of these representa- 
tions are needed. 
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3. DYNAMIC ICON GENERATION FOR VISUAL 
CORRELATION 



3.1 Dynamic icon approach 

Below we describe an approach for dynamically generating icons and 
Dynico, a software system (part of the Bruegel visual language framework) 
for visualizing complex events as aggregate iconographic images. The over- 
all purpose of this visualization scheme is to move the process of recogniz- 
ing semantic correlations and patterns from the textual information domain 
to the visual domain thus allowing faster correlation discovery and decision 
making. Dynico consists of basic syntactic and semantic conventions that the 
user can augment with domain specific information. 

In the 1940s Bliss [Bliss, 1978, 2002] began a significant development of 
iconic communications for natural language sentences. Now Blissymbolics 
is supported by such software as Bliss for Windows [Bliss for Windows, 
2002]. The basic elements of Blissymbolics reflect the technical capabilities 
of the 1940s. Icons are not always intuitive and the number of possible com- 
binations is very small. This is probably one of the reasons for its limited 
acceptance. Currently it is feasible to implement more complex, realistic 
icons and generate new icons dynamically, on demand. 

We classify approaches in this area into three categories. 

1) Static approach - all icons are predefined and designed in advance 
using paint-type software (resizing and changing background colors is 
permitted). 

2) Static-dynamic approach - complex icons can be generated auto- 
matically from predefined icon elements by simple combinations (on 
the right, left, top, or bottom) with possible resizing. 

3) Dynamic approach - complex icons can be automatically synthesized 
using the content of the text to be visualized, structural template de- 
scriptions, and iconels with use of XML-type descriptions. 

The first approach works in applications where only a limited number of 
visual features or icons are needed, for example, control icons in a specific 
software system or making painter-like visualizations [Healey, 2001]. 
Healey defines formally, what we call a static approach as a mapping: 

(j): Aj ^ Vj, where Aj is a data attribute from a set of data attributes A and Vj 
is a visual feature taken from a predefined set of visual features 
V= {Vl, V2,..., Vn,}. 

Bliss’ system is an example of the second approach. It maps the words 
presented in a sentence to individual icons and places the icons sequentially. 
For instance, the system can produce a new icon by placing one icon above 
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another to define a new concept. Consider the example in Figure 4(a) of the 
"Love in marriage" icon generated from the two words “Love” and “Mar- 
riage.” 

As an example of the complexity that can occur, consider the visualiza- 
tion of storytelling [Gershon & Page, 2001], massive texts and data often 
need more complex combinations of icons. Gershon and Page present sev- 
eral storytelling examples. For instance, the sentence “G-shape building is 
active between 8 and 10 a.m. and 4 and 6 p.m.” might be part of a story that 
should be visualized. 

For situations like this, we are developing a third approach. The first and 
second approaches are not scalable for the iconographic visualization of 
massive amounts of data where potentially thousands of icons are needed. It 
is not realistic that such a number of individual icons can be crafted in a 
static or static-dynamic manner. For instance, what if we want to visualize 
time information: “3:45 p.m. and 10 seconds.” We may use a traditional 
watch icon with arrows for hours, minutes, seconds, and a colored dot indi- 
cating a.m./p.m. status as in Figure 4(b). The static approach for this visuali- 
zation will require 432000 = 60 * 60 * 60 * 2 icons, assuming that each of 
three arrows can take any of 60 possible minute positions and two indica- 
tions (a.m., p.m.). When on takes into account that some positions of the 
hands do not portray critical data, we can cut the number of alternatives to 
86400 =12* 60 * 60 *2 icons since we can assume that for any hour (out of 
12), there are 60*60*2 combinations of the minute arrow, the second arrow 
and a.m./p.m. For such an application, the dynamic generation of visual fea- 
tures (icons) on demand is a natural way to solve the problem. 



The Dynico system addresses many shortcomings of static graphics in 
these contexts. A single dynamic icon can be used to convey many facets of 
a particular subject and can easily interact with other graphical components 
due to its vector graphics content. As we mentioned above, static and static- 
dynamic approaches do not attempt to incorporate the full dynamic potential 
in their systems design. To the best of our knowledge, the Dynico system is 




(a) Blissimbology: 



(b). Dynico: 
Dynamic clock icon 



Love and marriage icon Dynamic i 

Figure 4. Static and dynamic icons 
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the only dynamic system for creating aggregate iconographic images in this 
manner. 

The dynamic approach includes not only the dynamic generation of static 
icons but also the dynamic generation of dynamic icons such as framed ani- 
mation and key-framed interpolated animation. Since the goal of the current 
version is speed of visual interpretation of textual information and data, we 
have chosen to focus on static display states (static icons). This also avoids 
chronologically masking data that occurs in animated graphics. 

3.2 System architecture 

In the previous section, we motivated the need for a dynamic generation 
system. This section discusses different approaches to developing a dynamic 
iconic system. The problem is complex because: (1) the content of the text 
and data should be used to dynamically identify parameters of visual features 
in the icon and (2) a dynamic approach exhibits complex visual and textual 
content interactions that require strict control. This is in contrast with the 
minimal flexibility provided by standard and thus simpler operation with 
bitmaps and similar files available in a static visualization approach. 

Two competing ideas drive this issue: (i) design-specialized and content- 
specific tools, and (ii) development of a relatively universal mechanism. 
The first alternative can be used for the watch example presented in Figure 4 
in the following way: 

• Generate five predefined iconels: H arrow, M arrow, S arrow. Watch 
base and Dot indicator for a.m./p.m. 

• Compute the position of the M arrow on the Watch base as a func- 
tion of parsed text content: “<hour> 3 </hour>: <minutes> 45 
</minutes>: <seconds> 10 </seconds> <ampm> p.m. </ampm>”. 

• Compute the position of the H arrow on the Watch base as a func- 
tion of the same parsed text and the position of the M arrow already 
computed. 

• Compute a position of the S arrow and color fill for p.m. 

Obviously, this design will work only for this watch icon and thus it is 
highly specialized. This motivates the development of a more general 
mechanism called the template-based mechanism. In this mechanism, possi- 
ble locations of individual iconels are identified in a template including their 
mutual location (as we have seen for hour and minute arrows). Thus, the 
specifics are encapsulated in a template. Indeed, Dynico supports a higher 
level of parsing of text contents and rendering images. In this higher level, 
another set of templates is used to generate complex icons using lower-level 
templates, which are combined into a single icon. 
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3.2.1 Architecture details 

Input information is mapped to a visual representation on the gross level, 
and then a particular state of the graphical content is chosen. This state is 
based on the specific qualities of the mapped content. The icon is modeled as 
a set of layers to be rendered sequentially in creating the complete icon. 
Each layer is associated with an icon element (iconel) that is a basic syntac- 
tic element of the icon. Sequences of iconels are not static and can be 
changed in a user-defined template. Each layer has: (1) a semantic content 
associated with an underlying text, and (2) a specified sub-region of the total 
icon area to which the display of icon element in that field is restricted. 

Syntactic units. In Dynico, each abstract icon consists of a number of 
iconels, which are the basic graphical unit with which the system works. An 
iconel has an array of dynels, which maintains an array of frames, each con- 
taining primitives made up of points. Thus, there is a hierarchy of compo- 
nents, up to 5 - 6 levels depending on the level of an iconel or an icon. 
Dynico also uses dynels to support animation as another aspect of dynamic 
iconography. A sequence of frames operates like film frames or as a series of 
still images representing different states of the graphical data. The number of 
dynels can be large (50 or more) and vary for different icons. Naturally, 
dynels must be informed what to display. Designing a reasonably simple 
way to communicate these structures is a challenging task. Semantic units. 
A set of icons associated with a locality is called an Icon phrase. Icon 
phrases are combined into a “full thought” or an “event.” Generally, the lo- 
cality is a single icon used to reinforce the association of the content, how- 
ever this is current convention rather than a restriction; other grouping 
schemes can be realized through the use of templating. An individual icon or 
a group of icons is associated with some locality. A fuzzy logic membership 
function 

fiiocaiityj(Iconi): {Iconi} ^ [0,1] 

is used to associate Iconi with locality] if a strict categorization is not appro- 
priate for the task in hand. 

Iconic templates. Templates are used to accommodate aspects of a user 
configuration such as user weighting of data, potential interfaces for tem- 
plate design, dynamic placement, and dynamic merging of icons (variable 
iconographic data resolution). 

Specification of syntactic units. All syntactic graphical units of Dynico 
are vector graphics rather than bitmaps. Transformations with vector graph- 
ics are usually simple. Iconels, dynels, frames, and primitives are specified 
in a file called an SRT file. This is an XML formatted file; where each unit 
is described with attributes regarding points such as coordinate data and 
color. Figure 5 presents an example of an iconel description. 
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<?xml version="1.0"?> 

<ICONEL id="human armed.SRT" dimension="600,600" dynels="l"> 
<DYNEL id="New Dynel" frames="2"> 

<FRAME id="New Frame" primitives="24"> 

<PRIMITIVE id="New Primitive" type="LINE" width="4" 
stroke="44,200,44" fill="155,155,255" points="2"> 

<POINT coordinates="585,513"/> 

<POINT coordinates="513,585"/> 

</PRIMITIVE> 

<PRIMITIVE id="New Primitive" type="LINE" width="4" 
stroke="44,200,44" fill="155,155,255" points="2"> 

<POINT coordinates="585,450"/> 

<POINT coordinates="450,585"/> 

</PRIMITIVE> 

<PRIMITIVE id="Gun" type="POLYGON" width="2" 
stroke=" 1,1,1" fill="77,77,77" points="7"> 

<P01NT coordinates="387,315"/> 

<P01NT coordinates="405,315"/> 

<P01NT coordinates="405,225"/> 

<P01NT coordinates="414,225"/> 

<P01NT coordinates="396,180"/> 

<P01NT coordinates="378,225"/> 

<P01NT coordinates="387,225"/> 

</PRIMITIVE> 

</FRAME> 

</DYNEL> 



Figure 5. XML representation of iconel data 



3.2.2 Current tools and usage 

Layering iconographic elements into complex composite icons for repre- 
senting textual content in Bruegel is illustrated in Figure 6 for the text “Ac- 
complished bomb attack” where the flag icon indicates “accomplished” and 
the icon with two opposing arrows indicates an attack. Figure 6(a) empha- 
sizes accomplished through the location and size of iconel for accomplished. 
Similarly, Figures 6(b) and 6(c) emphasize attack and bomb respectively. 

Below we describe a dynamic placement schema and the use of raw 
iconels. Dynico works by operating upon a relatively small number of data 
points. This property is exploited to further generalize the specification of 
graphical elements as they need only be designed to occupy the optimum 
space for their individual design. The system can then transform the iconels 
to the appropriate proportions and locations in order to represent the seman- 
tic content behind them. In addition, LFser Defined Weighting (UDW) can 
determine the iconels actual location and the size of the icon. This may more 
clearly emphasize the data that an analyst finds to be more important. 
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(a) (b) (c) 

Figure 6. Icons with user defined weighting. See also color plates. 



How user priority interacts with the placement schema. Interaction is 
controlled by merging rules. The ability to collapse and expand icon groups 
in order to easily provide a user-required variable data resolution implies that 
some automated merging mechanism must be implemented in the system. 

To this end, a number of rules have been specified to guide the merging 
of icon elements. The rules are based on the geometrical attributes of graphi- 
cal content rather than data that the visual elements might represent. 

Templated Visualizations. With content encoded into icons, the use of 
a template allows a description of how icons are to be grouped. For example, 
either one might allow for simple left to right ordering of content or for a 
more complex aggregate super icons (see Figure 7). 



<ICON_DEF id="LIST Icon"> 

<LAYER_DEF id="Background" iconel="Background.srp"/> 

<LAYER_DEF id="Main" x="10" y="10" width="80" height="80"/> 

<LAYER_DEF id="IconFrame" iconel="IconFrame2.srp"/> 

</ICON_DEF><!-LIST Icon-> 

<STORYBOARD_DEF id="LIST Storyboard" using_def="Basic Dynico Icon" width="400" 
height="I20"> 

<PLACE_ICON id="Perp" x="40" y="40" width="100" height="I00"/> 
<PLACE_ICON id="Perp Org" x="10" y="10" width="40" height="40"/> 

<PLACE_ICON id="Wep I" x="10" y="145" width="30" height="30"/> 
<PLACE_ICON id="Wep 2" x="60" y="I45" width="30" height="30"/> 
<PLACE_ICON id="Wep 3" x="l 10" y="I45" width="30" height="30"/> 

<PLACE_ICON id="Targ Effect" x="I50" y="I0" width="60" height="60"/> 
<PLACE_ICON id="Perp Effect" x="I50" y="1 10" width="60" height="60"/> 

</STORYBOARD DEF><!-LIST Storyboard-> 

Figure 7. The XML Template File data 

Smurfico is a software tool developed to ease the burden of designing 
dynamic icons (see Figure 8). It is a simple utility that allows editing of an 
SRT file either visually or textually. Smurfico provides access to all of an 
SRT's data using two formats: graphical and XML/Textual. The graphic 
view allows editing by directly manipulating the shapes that comprise the 
data in the file. Icons are also stored in SVG format. 
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A number of tools are provided to control the various aspects of the file 
such as primitive type, color, ordering and shape. Things that would be hard 
to edit textually are included in this view. This includes scaling a primitive, 
changing the rendering order using a tree control, moving primitives, or 
moving entire frames. 







Figure 8. Bruegel object-oriented icon design tool, Smurfico 



The XML View allows access to and editing of the textual details of the 
file. This provides an easy way to change aspects of the file such as reorder- 
ing large groups of Dynamic Icon Objects, or copying/pasting of data. 
Rather than spending time designing interface features for every detail of a 
SRT file, some data are more easily accessed from this XML View. 



4. THE BRUEGEL ICONIC LANGUAGE EOR 
AUTOMATIC ICON GENERATION 

An advanced iconic system can not exist without extensive library of 
icons as we discussed above. Manual icon design is very time consuming. It 
is desirable to generate icons automatically and dynamically on demand. To 
be able to do this a formal language is needed. The main idea of such lan- 
guage is to represent icons to be generated as language expressions Ei with 
parameters pi, p 2 , .... Pk, E(pj, p 2 ,..., Pk) and to be able automatically gener- 
ate icons for all values of parameters that satisfy both graphic and semantic 
constraints. Further, this then allows the selecting an icon from previously 
generated and stored parameterized icons or the generating of such an icon 
dynamically, on demand for specific parameter values pi, p 2 , .... Pk- This 
motivates the following operations for producing combination icons for our 
formal Bruegel language: 
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1) post a subicon over another icon called a background icon, 

2) resolve the collision of subicons and iconels by moving icons, 

3 ) post an iconel over another iconel, 

4 ) resolve the collision of iconels by moving iconels. 

The main difference between subicons and iconels is semantic, typically 
an iconel is not used as an independent semantic entity; it is rather an attrib- 
ute or a property of the entity. For instance, the MS Windows iconel “short- 
cut” needs to be posted on an icon that represents a shortcut to the actual file 
rather than the file itself 

In Bruegel each icon is modeled as an object in Object Oriented Pro- 
gramming terms. New icon objects are created from parent icon objects. 
When one icon is posted over another icon, the new icon inherits properties 
of both icon objects. Below we describe an icon posting algebra based on a 
posting operation that produces a composite icon. 

Expression O1JO2 means that icon O2 is posted on icon Oi and icon 
O1JO2 is called a composite icon. The posting operation J is called an asso- 
ciative operation if expression 0iJ(02J03) is equivalent to expression 
(O1JO2) JO3, that is, 

0iJ(02J03) = (O1JO2) JO3. 

statement. If O2 n O3 = 0 then posting operation J is associative for 
0i,02 and O3. 

The last statement means that icons O2 and O3 can be posted over the icon 
Oi in any order without overlapping each other, because their intersection is 
empty. Thus resulting icons 0iJ(02J03) and (O1JO2) JO3 will be identi- 
cal. For associative icons Oi, O2 and O3 we will write simply O1JO2JO3 
omitting parentheses. 

It will be assumed by default that Oi J O2JO3 = (Oi J 02)J03 if the icons 
are not associative. The notation 0iJ(02J03) will also be read as posting 
two icons over another icon. The expression 02(])03 denotes icon collision. 
Resolving collision is provided by using operations such as those shown 
below: 

10p<— O2 means to move icon O2 10 pixels to the left. 

10pTO2 means to move icon O2 10 pixels up. 

10piO2 means to move icon O2 10 pixels down. 

80%02 means to produce a new icon that has 80 % of size of icon O2. 
More formally the first collision resolving rule can be written as written as a 
formal rule: 

02(])03 ^ (10p<— O2) 

Below we show an example (in Bruegel) of a syntactically correct expres- 
sion: 



OiJ (10pT80%02, 80 %( 03 J 60%04)). 
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This formula means that icon Oi is used as a background, O 4 is scaled to 
60% of its original size is posted on the O3, then O3 with O4 is scaled to 80% 
and posted on Oi, Also O2 scaled to 80% is moved 10 pixels up and posted 
on the Oi. 

This dynamic iconic language has an advantage over hardcoded icons. 
For instance, with hardcoded icons if we need to encode in icons three at- 
tributes with ten values each then we need to produce 10 * 10 * 10 = 1000 
icons manually. But formulas like those presented above will produce all of 
the required icons uniformly through the introduction of parameters such as 
a, b, c and d\ 



OiJ {a p'\b%02, d% 04 )). 

Parameter a might have 32 different values in a range [0, 31] pixels while 
parameters b, c and d might have 11 values 0%, 10%, 20%, ..., 90%, and 
100%. In this way we generate parametric icons. 

A collision in BGL can be automatically detected using standard clipping 
algorithms from computer graphics. For rectangular icons this is trivial and 
for rounded it remains simple. For more complex icons, it can be more chal- 
lenge, but assuming vectorized icons presented as objects, we can use a 
polygon clipping algorithm to test for collision. 

BGL uses rules to avoid and resolve collisions. Below we present exam- 
ples of such rules: 

Rule 1. If iconel x is a human target and iconel y is an armament/weapon 
theny should be posted on x and made 80% of x size: 

jc J (80%y). 

Rule 2. If iconel x is a human target and iconel y is a target modifier then 
y should be posted on the predefined spot {mi, m 2 ) on frame f. 

J{mi,m 2 )J (30%y), 

where mi and m 2 depend onx, mi=m 2 {x); m 2 =m 2 (x). 

Rule 3. If iconel x is a human target in record #1 and iconel y is a human 
target in record #2 andx ^y theny’s color should differ from color of x: 

f(x) J colorAltemate(n%y). 

The last rule is an example of color alternation rules. To limit complexity 
of icons, BGL prohibits posting expressions (formulas) with more than four 
posting operators. Alignment of posted icons in the case of their discrepan- 
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cies is made by applying conflation algorithms. This dynamic iconic lan- 
guage has an advantage over hardcoded icons; formulas like those presented 
above can produce icons uniformly. 

There has been related work conducted in the area of icon algebra 
[Chang, 1989; Chang, Polese, Orefice & Tucci, 1997]. The main differences 
between our work and these developments are in the goals. We are interested 
in automatic dynamic icon generation while Chang at al. are focused on 
deriving natural language concepts (from icons) by applying formal 
operators on the icons and their semantic meaning, which are limited to a set 
of predefined icons and meanings. This differences in focus produce rich 
semantic operations in Chang’s icon algebra and rich iconic operations in 
our iconic algebra. These two sets of operations can be combined if some 
goal were to require both types of operations. 

The motivation of Chang’s icon algebra can be illustrated by the follow- 
ing situation. Assume that we have a person who can not speak, but can 
communicate by composing iconic sentences from a limited set of icons and 
their modifiers. Having a limited set of icons, much smaller than the total 
number of words in English, a person who cannot speak is forced to combine 
available icons and modifiers to produce other words. 

For some words, the combinations can be acceptable, leading a good ap- 
proximation, while for other words, the result can be almost arbitrary or per- 
haps a distant metaphor. To express the word “cold,” a person who cannot 
speak could select two icons “icicle” and “thermometer.” The problem we 
face is that we want to retrieve just the word “cold” and not the word “win- 
ter” or “solid body” from these two icons. 

Our goal is a quite different. We assume that we are providing a system 
for a decision maker or an analyst who can speak, write, select, and analyze 
images including icons, but that the user does not have time to read a long 
text. That is, we assume such a person would prefer to read and “decipher” 
an iconic sentence that summarizes long text. 

Further, we assume that the user can get the exact icon meaning by, say, 
placing the mouse over an icon causing the icon meaning to pop up and be 
read. Thus, the major problem of Chang-type of applications simply does not 
exist in our environment. In a Chang application, a person with a speech dis- 
ability is very limited in his abilities to combine icons and to produce the 
exact icon that is needed. We have no such severe limitation; icons can be 
combined extensively and thus a relatively large set of concepts can be di- 
rectly encoded in icons. 
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5. CASE STUDIES: CORRELATING TERRORISM 
EVENTS 

Iconic annotation of database records means creating another database, 
where some textual attributes of records are augmented or substituted by 
icons. There are several reasons for an iconic annotation of databases. Iconic 
annotations can help navigate the database, search records visually, correlate 
records, discover useful patterns in databases, and support decision making. 
People usually process alphanumeric textual information sequentially, but 
ean process images (icons) in parallel, which is much faster. 

The Bruegel visual correlation system has several potential applieations 
in homeland security, defense, intelligence and crime prevention through 
activities such as tracking terrorist activities, which includes identifying mo- 
dus operandi and estimating future terrorist threats, tracking weapons of 
mass destruction, and drug trafficking. 

5.1 MUC data description 

In this case study, we use DARPA MUC-3 and MUC-4 data on terrorist 
activities in Latin America in 1980s. MUC data are now in the public do- 
main at NIST and downloadable from [MUC Data Sets, NIST, 2001]. 

Table 2 presents a summary of the raw text corpus and Table 3 provides a 
sample of raw text message. 



Table 2. MUC raw text corpus description 



Characteristic 


Description 


1 . Data sources 


The Foreign Broadcast Information Service. 


2. Text types 


Newspaper and newswire stories, radio and TV broad- 
casts, interviews, and rebel communiques summary 
reports, transcripts from speeches and interviews 


3. Location 


Latin America 


4. Original language 


Spanish 


5. Text grammar 


Well-formed sentences, all are in upper case 


6. Number of texts s 


1300 texts 


7. Individual text size 


Average size is 12 sentences (-0.5 a page), 
smallest text -one paragraph, largest text -two pages 


8. Number of sentences 


15,600 sentences 


9.Number of unique lexical items 


18,240 lexical units 


lO.Number of words 


400,000 words 


1 1 .Number of events in a text 


1-5 events per single text source 


12.Average sentence length 


27 words 


13. Timeframe 


1980s 


14. Terrorism relevant texts, % 


50% 
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DARPA has sponsored content extraction competitions based on this text 
corpus [Cardie, 1994; Hobbs, Appelt, Bear, Israel, Kameyama, Shekel, & 
Tyson, 1996; Lehnert, Sundheim, 1991; Lehnert, Cardie, Fisher, McCarthy, 
Riloff, & Soderland, 1992a,b; Lehnert, 1994; MUC-3, 1991; MUC-4, 1992], 
Competitions are called Message Understanding Conferences (consider spe- 
cifically MUC-3, 1991 and MUC-4, 1992). 

Table 3. A sample of raw text 

DEV-MUC3-0008 (NOSC) 

BOGOTA, 9 JAN 90 (EFE) - [TEXT] RICARDO ALFONSO CASTELLAR, MAYOR OF 
ACHI, IN THE NORTHERN DEPARTMENT OF BOLIVAR, WHO WAS KIDNAPPED 
ON 5 JANUARY, APPARENTLY BY ARMY OF NATIONAL LIBERATION (ELN) 
GUERRILLAS, WAS FOUND DEAD TODAY, ACCORDING TO AUTHORI- 
TIES.CASTELLAR WAS KIDNAPPED ON 5 JANUARY ON THE OUTSKIRTS OF 
ACHI, ABOUT 850 KM NORTH OF BOGOTA, BY A GROUP OF ARMED MEN, WHO 
FORCED HIM TO ACCOMPANY THEM TO AN UNDISCLOSED LOCATION. POLICE 
SOURCES IN CARTAGENA REPORTED THAT CASTELLAR’ S BODY SHOWED 
SIGNS OF TORTURE AND SEVERAL BULLET WOUNDS. CASTELLAR WAS KID- 
NAPPED BY ELN GUERRILLAS WHILE HE WAS TRAVELING IN A BOAT DOWN 
THE CAUCA RIVER TO THE TENCHE AREA, A REGION WITHIN HIS JURISDIC- 
TION. IN CARTAGENA IT WAS REPORTED THAT CASTELLAR FACED A ’’REVO- 
LUTIONARY TRIAL” BY THE ELN AND THAT HE WAS FOUND GUILTY AND 
EXECUTED. CASTELLAR IS THE SECOND MAYOR THAT HAS BEEN MURDERED 
IN COLOMBIA IN THE LAST 3 DAYS. ON 5 JANUARY, CARLOS JULIO TORRADO, 
MAYOR OF ABREGO IN THE NORTHEASTERN DEPARTMENT OF SANTANDER, 
WAS KILLED APPARENTLY BY ANOTHER GUERILLA COLUMN, ALSO BELONG- 
ING TO THE ELN. TORRADO’S SON, WILLIAM; GUSTAVO JACOME QUINTERO, 
THE DEPARTMENTAL GOVERNMENT SECRETARY; AND BODYGUARD JAIRO 
ORTEGA, WERE ALSO KILLED. THE GROUP WAS TRAVELING IN A 4-WHEEL 
DRIVE VEHICLE BETWEEN CUCUTA AND THE RURAL AREA KNOWN AS CAM- 
PANARIO WHEN THEIR VEHICLE WAS BLOWN UP BY FOUR EXPLOSIVE 
CHARGES THAT DETONATED ON THE HIGHWAY. 



The case study uses structured data called the development corpus that 
has been produced by 15 MUC teams from raw texts during MUC-3/MUC-4 
using manual categorization and tagging. The MUC-3/MUC-4 task was to 
automatically extract information about terrorist incidents from raw test 
texts compiled for two years using the structured development corpus as 
training data. The goal was to determine when a given text contained rele- 
vant or irrelevant information. Each team stored their results in a template 
format, one template per event. 

The template format contains with 24 attributes called slots. The text ex- 
ample above has been converted to two output templates because it describes 
two terrorist events; the kidnapping of Castellar and the bombing of Tor- 
rado’s four-wheel drive vehicle. 

Template attributes cover the date and location of the incident, the type 
of incident (24 types of violence), the perpetrators, victims, and physical 
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targets in the attack, and the effect on the target. If there is more than one 
value for a slot then options are separated by a “/” [Hobbs et ah, 1996]. 

The twenty-four types of violence include eight basic types (such as mur- 
der, bombings, kidnappings, arson, and attacks) plus two variations on each 
(for threatened incidents and attempted incidents). There are also judgmental 
attributes such as perpetrator confidence, concerning the reliability of the 
perpetrator's identity. 

The manual process of obtaining structured data as a development corpus 
was time consuming and non-trivial: 

... it takes an experienced researcher at least three days to cover 1 00 texts 
and produce good quality template representations for those texts. This is 
an optimistic estimate, which assumes familiarity with a stable set of en- 
coding guidelines [Lehnert & Sundheim, 1991] 

Although automatic content extraction is not the focus of our research, 
MUC results set up a benchmark for the level of effort needed for obtaining 
stmctured data as input for Bruegel iconic visual correlation system. 

5.2 What needs to be visualized in icons? 

The algorithm for deciding what part of a textual message will become 
iconic and what will be placed into the “mouse over” pop-up uses three crite- 
ria: 

• Terms included in an ontology (key -words) go into the icon, 

• The most frequent terms go into the icon, and 

• Personal names and individual organization names go into the 
“mouse over” pop-up. 

To build an ontology, we analyzed the frequencies of MUC terms. The most 
frequent terms have been found in the following steps: 

Step 1. Create a parser to dissect the MUC files, particularly the data 
fields where three classes of phrases were defined based on their context: 
KEYWORDS such as accomplished and civilian. 

PLAIN TEXT STRINGS such as names and extended text descriptions. 
DESCRIPTORS such as further data about locations, e.g., city and town. 

Step 2. Run the parser on MUC data 

Step 3. Format and track the results with regard to their usage count and 
location within categories and sub-sections. 

Step 4. Generate a list of the 100 most frequently occurring phrases in the 
MUC TST l and MUC_TST_2 files. 

Step 5. Design icons for the 100 most common words tracked, which 
constitute a large portion of the data contained in the MUC files. 
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As the usage count drops below fifteen, the PLAIN TEXT STRINGS 
more often are proper names that would be better suited to “mouse over” 
data, than to visualization. 

A large number of the words identified were KEYWORDS. These were 
the most likely candidates to be represented as fairly static icon elements 
often forming iconels. Some PLAIN TEXT STRINGS occurred often 
enough that they could be considered key words. Dynamite was a commonly 
occurring word of this type. Most other PLAIN TEXT STRINGS were 
good candidates for “mouse over” data (MOD) available on a lower level of 
information. Finally, descriptors are always associated with a location and 
thus can be represented as simple iconel identifiers as we will see below. 

We defined the following characteristics for location: name, geographi- 
cal location (coordinates), importance/threat, classification (friend, hostile, 
neutral) that needed to be conveyed by the system. Similarly, for people, we 
defined location characteristics with associated nation, affiliation and several 
others. 

5.3 A Demonstration 

Characteristics that take on graduated values are prevalent in the MUG 
files. They include: organizational confidence; damage (physical and human) 
and quantity (number and total number). We explored a variety of options 
for visualizing these values. Some options are shown in Figure 9. Both op- 
tions in this figure use “slash” scales incorporated into the icons. 



f 



(a) 




The slashes across the lower right indicate 
that this is a small band (blue scale) of armed 
rebels (or terrorists). 



This icon describes a fairly large number of 
civilian targets (blue scale) were hurt pretty 
badly (red scale) by some action. 



Figure 9. Possible icons for MUC concepts. See also color plates. 



In some cases, it may be beneficial to encode more than one parameter in 
such scales. For instance, the number of targets and the level of damage 
caused to them can be encoded in the same icon. This can be accomplished 
by offsetting on two scales. Figure 9(b) shows offsetting of red scale (level 
of damage) and blue scales (the number of people) to avoid their overlap- 
ping. Usability studies have shown that users can easily read case 9(a) while 
users need some accommodation time for case 9(b). 

The intent of the vertical scale on Figure 9 is to represent confidence 
level about the truth of the data. In particular, it can show low confidence in 
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the quantity or quality of the information displayed in the icon. Thus, the 
meaning of the vertical scale and “slash” scales is content-dependent. 

The meaning depends on the main content of the icon. For example, a 
yellow mark in the green section of the vertical scale in Figure 9(a) can indi- 
cate a high confidence in the data depicted in this icon. It clearly can not rep- 
resent confidence in other data not present in the icon. 

The vertical scale also can be interpreted as two additional content- 
dependent scales that represent the relevance of the icon content to the 
evaluating party. The top green half can represent importance, the bottom 
red half can represent threat. Assigning values may require more information 
about an item than is available from the source being visualized. In this case 
there must be some database available that can supply additional informa- 
tion. 

Figures 10-14 show the icons developed as a result of the analysis of the 
most common words in the MUC files. These icons are used in Bruegel’s 
listview to represent records for visual correlation. Another view imple- 
mented in Bruegel is called macroview. It is mostly is based on simple filled 
rectangles possibly combined with simple texture. These simple icons allow 
the encoding of more records into a single screen than the rich listview icons 
allow, but macroview icons provide less detail. 
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Figure 10. Bruegel icon examples: base icons. See also color plates. 



Bruegel also allows complex combinations of icons and iconic sentences: 
some icons can represent information in a very detailed way, and some icons 
can be generalized as macroview icons. 
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These combinations serve as a tool to present information on multiple 
levels simultaneously. Records presented as iconic sentences and located on 
the same screen potentially generate a variety of patterns that can aid in 
their visual correlation. In addition, contradiction between records can be 
revealed in these patterns. 



t 
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organization (a tree icon) of 
a medium size (encoded by 
green lines) and relatively 
high confidence (encoded 
by a yellow mark on red) 


Several soldiers perpetrators 
encoded by the soldier icon, 
red modifier for perpetrators; 
5 blue lines for several and a 
yellow mark for medium in 
the red confidence scale. 


terrorist act with dynamite 
and significant damage (red 
lines) 



Figure 11. Bruegel composite icons 
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Figure 12. Bruegel icon examples: targets 



Attack: 






terrorist 

bombing 



terrorist 




terrorist 

kidnapping 





terrorist 

gunning 



state spon- 
sored vio- 
lence, (flag 
as an indica- 
tor). 



Figure 13. Bruegel icon examples: attacks 



In macroview, we deliberately designed some icons to be able to produce 
large visible patterns although each individual icon or a spider’s web. 
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Lens 
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small 
country. 




Location 
icon with 
town 
modifier 




location: 
icon type 
3. The flag 
indicates a 
country. 




Location 
icon with 
department 
modifier 



Figure 14. Bruegel icon examples: location icon types. See also color plates. 



The convention of BIL is to use a base icon and attached modifiers. For 
instance, the first icon in the second row of Figure 12 shows a government 
official as a target using the target modifier. This official was also pretty im- 
portant as can be seen by the scale on the left. Similarly, the icon in the mid- 
dle of the second row shows that a fair sized band of soldiers from a rather 
hostile government committed some act, most likely against something or 
someone important to the country. If more than one target is described it is 
suggested that, in order to not hinder the intuition, only the most important 
target be represented, and then the others implied with the Quantity scale 
shown as here. 

The iconic language presented above provides one of the examples of 
languages that can be loaded to the Bruegel system along with the appropri- 
ate and icons matched to the ontology. Some implemented examples are pre- 
sented in Figure 15. A time test for reading iconic sentences in Bruegel is 
available on line [Koval erchuk. Brown & Kovalerchuk, 2001]. 
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Figure 15. Bruegel case studies. See also color plates. 
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6. CASE STUDIES: CORRELATING FILES AND 
CRIMINAL EVENTS 



6.1 Visualization for file system navigation 



Computer users spend millions of hours navigating file systems with 
thousands of files using tools such as Windows Explorer. Iconic annotation 
of DB can make this task easier. Currently iconic database annotating is in 
its infancy stage. For instance, Windows Explorer shows 1-5 attributes of 
files and uses a single icon. To get more information about a file, a user 
needs to dig deeper; often it requires opening a file and browsing its con- 
tents, which can be time consuming. 

Available search tools can help only if a useful keyword is known. In a 
number of circumstances, such a keyword is unknown. In cases like this, 
users might be more successful in their search by carrying out their naviga- 
tion in an iconic file system. Consider first the information provided by 
Windows Explorer and similar tools. The “Large icons” option shows the 
file name, fie type, and an icon while the “Details” option additionally pro- 
vides the file size, time of modification and attributes such as archive, read- 
only, compressed, hidden, and system. To get more information a user pops 
up a “properties” menu for each individual file. In this way a user will get 
more information such as when a file was created, modified and security 
details (permissions, auditing, and ownership). 



Table 4. Iconic sentences that summarize file decryptions in a file system 



Date Name Modified Category Security Plat- Similar Size 

created form to X 




1 Xxl 


1 


Q Win Q 

IP # 32 \ 


— 






i Yyl 


i 


H ^ XP 


= 




a zzi 


i' 


Q Win 0 

^ ® 16 ' 


= 





Such a traditional GUI interface does not help to compare/correlate these 
attributes for different files, because it shows one file at a time. Table 4 illus- 
trates an iconic description of files. Here category can be represent attributes 
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such as archive, compressed, system, or hidden, while security can be repre- 
sented by permissions, auditing, and ownership.The size of the icons in the 
“Size” field represent file size. The larger icon represents a larger file. We 
use a logarithmic scale to encode size. Similarly, a larger calendar icon 
represents more recent dates of file creation and modification. For instance if 
a user wants to free some space by looking for old large files that can be de- 
leted, the size of the icons for dates and file sizes quickly reveal those files. 
In general, the whole ontology of file system concepts can be encoded in 
icons in our Bruegel system as XML files. Visual correlation and navigation 
in such visual sentences (visual databases) can be beneficial in a variety of 
applications some of them we illustrate below. 

6.2 Visual correlation for drug trafficking 

An iconic annotation of dmg trafficking database can make explanatory 
analysis easier. In addition, the same general benefits of iconizing the DB 
are applicable here. These benefits include intuitive iconic queries and 
quickly browsing the DB contents. A traditional DB GUI interface was not 
designed to assist in comparing/correlating drug trafficking records. Table 5 
illustrates an iconic description of drug trafficking records. The depicted 
categories include date, location, offender, delivery storage, transit point, 
value, witness, and legal aspects. Additional categories may include a record 
ID and security information. 



Table J. Iconic sentences that summarize drag trafficking records 



Date Location 


Offender Delivery, 
storage 


Transit 

point 


Value Witness Legal 




a 




A ■ 




I 


— 










(D 


ill 


$$ 1 


= 






1 




JL ^ 




$ 


= 





As before, the whole ontology of drug trafficking can be encoded in 
icons as XML files. Similar to file system navigation, the size of the icons 
also conveys information. For instance, the size of the “Legal” icon can rep- 
resent the amount of legal information collected or level of potential legal 



256 



Chapter 10 



implications for the offender. Visual correlation and navigation in such vis- 
ual sentences can reveal patterns and new trends in drug trafficking. 



6.3 Visual correlation of criminal records 

Much like the iconization of a drug traffic DB, an iconic annotation for 
criminal records can improve the ease with which explanatory studies are 
conducted. Table 6 illustrates an iconic description of criminal records that 
includes date, location, offender, tools, victim, harm, witness, and legal as- 
pects. 

Visual correlation of displayed records quickly reveals the variety of lo- 
cations, tools, victims, and witnesses of the criminal acts on the same day 
and similar categories of offenders. 

Table 6. Iconic sentences that summarize criminal records 
Date Location Offender Tools Victim Harm Witness Legal 




7. CASE STUDIES: MARKET AND HEALTH CARE 



7.1 Visual correlation of the real estate market 

A real estate market provides another example where an iconic annota- 
tion of a DB provides a simpler explanatory analysis. Consider for example, 
comparing several offerings at once and selecting the most appropriate one. 

Table 7 illustrates an iconic description of real estate offerings that in- 
cludes date, available transport nearby, contact information (phone, fax), 
price range, number of floor and rooms, quality of schools nearby, and legal 
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aspects. Visual correlation of displayed records quickly reveals the variety of 
prices, sizes, and other parameters of offerings to select from. 



Table 7. Iconic sentences that summarize houses for sale 



Date 


Trans- 

port 


Phone 

Fax 


Price 


Number 
of floors 


Rooms 


School Legal 




T 




ft 
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ft 
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ft 
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= 





7.2 Visual correlation for job search 

Job market records can be naturally iconized again enhancing explana- 
tory analysis. Here the activities include comparing available jobs and can- 
didates. Table 8 illustrates an iconic description of jobs available that in- 
cludes due date, available transport nearby, contact information (phone, fax), 
pay package, benefits, relocation package, career growth opportunities, and 
job requirements. A larger size of icons is indicative to a larger benefits and 
requirements. An icon “6FP” stands for “six figure pay”. There is a natural 
order between pay icons: 6FP>$$$>$$. 

Visual correlation of the displayed records quickly reveals that there is 
one highly paid job with significant benefits and requirements, another job is 
more modest in these parameters and the third job opening is a well paid but 
with lower benefits and career growth opportunities. Such a visual compari- 
son demonstrates an easier method for multicriteria comparison along with 
decreasing the possibility of memory overload. The Bruegel system supports 
the identification of otherwise non-comparable offerings that are closest to a 
given job description (query). 

In more formal terms these offerings can be represented as a set of n- 
dimensional vectors Xi=(xiipci 2 ,...,Xin), where each vector corresponds to a 
row in Table 8 or its extension that contains more rows. Parameters such as 
pay package, benefits, relocation package have a natural order of their val- 
ues, e.g., for pay packages we have $50000< $60000. However, the two vec- 
tors X{ and JCj might not be ordered, e.g., rows 2 and 3 in Table 8 are not or- 
dered - row 2 indicates a lower pay package, but higher benefits than row 3. 
In general, there is only a partial order between vectors. Vector Xi is called a 
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Pareto optimal vector in the set of vectors X if there is no other vector in X 
that has all parameters equal or greater than x. A set X can have several 
Pareto optimal vectors. These vectors form a Pareto optimal border. 



Table 8. Iconic sentences that summarize open positions 
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7.3 Visual correlation for medical research 

Our investigations have shown that medical databases can specifically 
benefit from iconic visualization, e.g., breast cancer databases. Figure 16 
illustrates such an application. 

It is also interesting to note that here we provide not only an annotation 
of alphanumeric data but also an annotation of the image database associated 
with the mammography X-ray images. Image information is especially ap- 
propriate for iconization. 

The shape of tumor can be sketched in an icon more easily and precisely 
than it can be described in text. Figure 16 shows records of several patients, 
where the 12 columns represent features of a mammogram such as tissue 
density, tumor shape, tumor size, and calcification. The last column indicates 
a type of tumor, benign or malignant, where the red shape indicates cancer 
and the white shape indicates benign tumor. 
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Figure 16. Iconic representation of breast cancer X-ray images 

Numeric attribute values are also shown in this representation next to 
icons. However, numeric values are much less helpful for insight about fea- 
ture patterns. For instance, the red/white tumor icons in the last column are 
immediately identified as cancer positive and cancer negative cases while 
encoding, say, 1 for cancer cases and 0 for benign cases is purely arbitrary. 

There are many more reasons for an iconic annotation of medical image 
and databases; these include the fact that iconic annotations aid navigation 
through a set of patients’ records, aid searching for patients with specific 
features visually, aid in correlating patients, aid in discovering useful cancer 
and benign patterns in a database, and aid in supporting diagnosis and treat- 
ment. All these advantages are based on unique the human ability of parallel 
image processing and the quick discovering visual patterns that were devel- 
oped over millions of years of evolution. 



8. CONCLUSIONS 

This chapter described the basic concepts of the Bruegel iconic visualiza- 
tion and visual correlation system. The description includes Bruegel func- 
tionality and Bruegel’s ability to compress information via iconic, semantic 
zooming, and dynamic iconic sentences. The chapter provides a brief de- 
scription of Bruegel’s architecture and tools. 

The formal Bruegel iconic language for automatic icon generation is also 
outlined. BGL specifies the layering of iconographic elements into complex 
icons in order to represent textual content in a space efficient manner. 
Dynico, is a supporting tool for the BGL and aids in the generation complex 



260 



Chapter 10 



icons dynamically. SDGravTree, described briefly in section 1, is another a 
supporting tool for the BGL which generates complex 3-D trees of icons dy- 
namically. Other graphical tools in Bruegel support BGL by correlating 
visually complex objects and events. 

Several case studies describe the capabilities of the Bruegel iconic archi- 
tecture which can be used for the visual correlation of a variety of tasks from 
terrorist events, to file system navigation, to drug trafficking, to criminal re- 
cord presentation, to the real estate and job markets, and finally to medical 
research, diagnosis and treatment. 

Future research will look at the development of an iconographic algebra 
that fits the visual correlation tasks. It will also address scriptable interactiv- 
ity and the incorporation of XML-based data representation conventions 
such as Darpa Agent Markup Language (DAML). The architecture will be 
expanded to accommodate more tasks such as pattern matching, iconic opti- 
mization and decision making. 
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10. EXERCISES AND PROBLEMS 

1. Design a hierarchy of concepts for job market as an iconized ontology 
with at least three levels, where each concept has its icon. This ontology 
can permit you to build semantic zooming. To do this describe with 
icons three job offerings using this iconized ontology at each level. You 
can animate your semantic zooming in PowerPoint. 

Advanced 

2. Design a set of iconels for exercise 1 that will permit you to build your 
icons dynamically on demand. Build a formal language for such iconel 
combinations including iconel repositioning and resizing. 

3. Offer a new application domain for iconized ontology and workout exer- 
cises 1 and 2 for this domain. 



10. Bruegel iconic correlation system 



261 



4. Suggest an advanced visual correlation architecture based on Bruegel 
architecture that will permit to discover patterns between visual sen- 
tences recorded as iconic sequences. 
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Abstract: We introduce two dynamic visualization techniques using multi-dimensional 

sealing to analyze transient data streams such as newswires and remote sens- 
ing imagery. While the time-sensitive nature of these data streams requires 
immediate attention in many applications, the unpredictable and unbounded 
characteristics of this information can potentially overwhelm many scaling al- 
gorithms that require a full re-computation for every update. We present an 
adaptive visualization technique based on data stratification to ingest stream 
information adaptively when influx rate exceeds processing rate. We also de- 
seribe an incremental visualization technique based on data fusion to project 
new infonnation directly onto a visualization subspace spanned by the singular 
vectors of the previously processed neighboring data. The ultimate goal is to 
leverage the value of legacy and new infonnation and minimize re-processing 
of the entire dataset in full resolution. We demonstrate these dynamie visuali- 
zation results using a newswire corpus, a remote sensing imagery sequence, 
and a hydroclimate dataset. 

Key words: Dynamic Visualization, Text Visualization, Remote Sensing Imagery, Hydro- 

climate Dataset, Transient Data Stream 



1. INTRODUCTION 

Advancements in telecommunications and high-speed networks have re- 
cently created a new category of digital information known as data streams 
[Babcock, Babu, Datar, Motwani & Widom, 2002]. This time-varying in- 
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formation has the unique characteristic of arriving continuously, unpredicta- 
bly, and unboundedly without any persistent patterns. Data stream examples 
include newswires, Internet click streams, network resource measurements, 
phone call records, and remote sensing imagery. The increasing demands of 
immediate analyses and actions on these transient data streams in many 
time-sensitive applications such as Homeland Security have spawned a se- 
ries of investigations [Babu & Widom, 2001; Cortes, Fisher, Pregibon, 
Rogers & Smith, 2000; Domingos & Hulten, 2000; O’Callaghan, Mishra, 
Meyerson, Cuba & Motwani, 2002] to query, mine, and model the informa- 
tion through nontraditional approaches. This chapter focuses on finding a 
visual-based solution to the fast-growing research area with demonstrations 
using text, imagery, and climate stream s . 

Generally, visualizing transient data streams requires fusing a large 
amount of previously analyzed information with a smaller amount of new 
information. This new information is at least as important as its larger coun- 
terpart because the resultant visualization is entirely dependent on the data- 
set and the user parameters applied to it. Thus, the whole dataset must be 
reprocessed in full resolution in order to capture the finest details. In reality, 
this is a challenging task given the unbounded and unpredictable nature of 
the streams. 

Our first objective is to develop an adaptive visualization technique that 
allows one to get the best understanding of the transient data streams adap- 
tively during critical moments when influx rate exceeds processing rate. Our 
approach is built on the concept of data stratification that intelligently re- 
duces the data size in exchange for substantial reduction of processing time. 
Although we only use classical multidimensional scaling (MDS) [Cox & 
Cox, 1994] in our investigation, our adaptive technique will work well with 
other scaling methods because we only modify the data, not the scaling algo- 
rithms that process it. 

Our second objective is to develop an incremental visualization technique 
that allows one to project a certain amount of new information incrementally 
onto an orthogonal subspace spanned by the most important singular vectors 
of the previously processed data. The design is based on a multiple sliding 
window concept that uses dominant Eigenvectors obtained from a large data 
window to accommodate the information from a smaller data window with- 
out reprocessing the entire dataset. 

The primary visualization output of an MDS process is a low- 
dimensional scatterplot in which pairwise distances between any points re- 
flect the similarities of the items represented by the points. Because a large 
part of our work is based on progressive approximation and adaptation, er- 
ror-tracking plays a vital role in showing the viability of our work. We use 
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both visual- and computational-based means extensively to compare multi- 
ple scatterplots simultaneously and report their discrepancies. 

Our ultimate goal is to leverage the value of legacy and new information 
and minimize re-processing the entire dataset in full resolution. Our ap- 
proach is to turn a prevailing and widespread visualization tool (i.e., MDS), 
which was initially developed for analyzing static data collections, into a 
dynamic data analysis tool for transient data streams. This chapter explains 
the motivations and provides critical technologies to accomplish such a dy- 
namic analysis environment. 



2. RELATED WORK 

Although the study of data streams is relatively new to the visualization 
community, it gradually has become one of the most pressing and difficult 
data problems in today’s information technology (IT) world. Recently, [Bab- 
cock, Babu, Datar, Motwani & Widom, 2002] presented an overview paper 
that defines the topic, describes the background, and introduces challenging 
issues of this young research area. Among the hot research topics are dy- 
namic query [Babu & Widom, 2001; O’Callaghan, Mishra, Meyerson, Cuba 
& Motwani, 2002] and data mining [Cortes, Fisher, Pregibon, Rogers & 
Smith, 2000; Domingos & Hulten, 2000] of the transient streams. 

MDS has always been an important part of information visualization 
studies. [Wise, Thomas, Pennock, Lantrip, Pottier, Schur & Crow, 1995] and 
[Wise, 1999] used MDS to analyze large corpora. [Bentley & Ward, 1996] 
studied the stress property of a class of MDS known as non-metric multidi- 
mensional scaling [Cox & Cox, 1994]. [Wong & Bergeron, 1997] applied 
MDS in a high-dimensional brushing and linking visualization environment. 
And more recently, [Morrison, Ross & Chalmers, 2002] reported a new 
MDS technique that algorithmically outperforms all previous implementa- 
tions. Many of these MDS-based applications can potentially take advantage 
of our adaptive technique and further improve performance. 

This chapter uses text, imagery, and climate streams for demonstrations. 
The text visualization community has largely been influenced by two 
groundbreaking works: 1) Salton and McGill’s Vector Space Model (VSM) 
[Salton & McGill, 1983], which represents the documents as numerical vec- 
tors, and 2) Deerwester et al.’s Latent Semantic Indexing (LSI) [Deerwester, 
Dumais, Furnas, Landauer & Harshman, 1990], which effectively projects 
the vectors into an Eigenspace based on a classical MDS design. Many suc- 
cessful commercial text visualization systems today — such as Aureka 
[Aureka, 2003], OmniViz [OmniViz, 2003], SPIRE [Wise, 1999; Wise, 
Thomas, Pennock, Lantrip, Pottier, Schur & Crow, 1995], Starlight [Risch, 
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Rex, Dowson, Walters, May & Moon, 1997], and Vxinsight [Vxinsight, 
2003] — were developed based on these two models. 

Although MDS is frequently applied to text analysis, researchers also use 
MDS to visualize images and climate modeling information. For example, 
[Rodden, Basalaj, Sinclair & Wood, 1999] discuss a novel image scatterplot 
without overlapping and [Wong, Foote, Leung, Adams & Thomas, 2000] 
present a data signature scheme to study the trend of climate modeling data- 
sets. Of particular note is that the prior work presented here has assumed a 
static data collection, which is very different from the dynamic data streams 
discussed in this chapter. 



3. DEMONSTRATION DATASET AND 

PREPROCESSING 

We use a newswire corpus, a remote imagery sequence, and an observed 
hydroclimate dataset for demonstrations. We describe the datasets and pre- 
processing steps to generate the required vectors (which represent individual 
units) and matrices (which represent the whole dataset) for MDS analysis. 

3.1 Document Corpus 

The demo corpus has 3,298 news articles collected from open sources 
during April 20-26, 1995. It has a strong theme associated with the bombing 
of the U.S. Federal Building in Oklahoma, the O.J. Simpson trial, and the 
French elections. 

The first step in processing the corpus is to identify a set of content- 
bearing words [Bookstein, Klein & Raita, 1998] from the documents. Words 
separated by white spaces in a corpus are evaluated within the context of the 
corpus to assess whether a word is interesting enough to be a topic. The co- 
occurrence or lack of co-occurrence of these words in documents is used to 
evaluate the strengths of the words. 

The second step is to construct the document vectors for the corpus. A 
document vector, which is an array of real numbers, contains the weighted 
strengths of the interesting words found in the corresponding document. 
These vectors are normalized and the result is a document matrix that repre- 
sents the corpus. In our example, a document vector contains 200 numbers. 
Because there are 3,298 documents, the dimensions of the document matrix 
are 3,298x200. Details of our text engine can be found in [Wise, Thomas, 
Pennock, Lantrip, Pottier, Schur & Crow, 1995; Wise, 1999]. 
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3.2 Remote Sensing Imagery 

The remote sensing imagery sequence used in our demonstration was 
taken by an aircraft over the semi-desert area in Eastern Washington. The 
aircraft was equipped with a hyperspectral sensor that could take multiple 
images of the same locations simultaneously in different spectral bands. Fig- 
ure 1 a illustrates the basic concept of the scanning operation. 




c 



Figure 1. a) An illustration of the operation, b) A sketch of a hyperspectral image set with 
169 spectral bands ranging from very short to very long bands, c) A color infrared (CIR) im- 
agery of the semi-desert area in Eastern Washington. See also color plates. 

The processing of the remote imagery information is straightforward. We 
want to extract features of the area by analyzing the similarities of the image 
pixels from different spectral bands. In our examples in Sections 6.4 and 8, 
the image in each spectral band (or layer) has 32x128=4096 pixels, as illus- 
trated in Figure lb. Figure Ic depicts a color infrared (CIR) image shot in 
infrared band. A pixel vector, in this case, contains image information of the 
same pixel position across the 169 spectral bands. In other words, each pixel 
position establishes a pixel vector. Because there are 4,096 pixels in each 
image, the dimensions of the pixel matrix are 4,096x169. For larger images 
we can sub-sample the image and control the size of the matrix. 

3.3 Hydroclimate Dataset 

The hydroclimate dataset used in our study consists of daily maximum 
and minimum surface temperature and precipitation gridded at one-eight 
degree of the western US from 1949 to 2000. Gridded data were produced 
following the methodology outlined by Maurer et al. based on daily observa- 
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tions made at the National Oceanic and Atmospheric Administration 
(NOAA) Cooperative Observer (Co-op) stations. A study using this observa- 
tion dataset with a second simulated dataset was conducted in [Leung, Qian 
& Bian, 2003; Leung, Qian, Bian & Hunt, 2003]. 

In our study, 6,155 sampling points are selected regularly from the grid- 
ded dataset that covers the entire western US. The surface temperature and 
precipitation information is combined using a series of Gaussian-weighted 
spatial smoothing. Fast Fourier Transform, and histogram processes to form 
a hydro-climate vector with 24 real numbers, which represents the climate 
characteristics and properties of that location. In other words, the dimen- 
sions of the hydro-climate matrix in our study is 6155 x 24. 



4. MULTIDIMENSIONAL SCALING 



MDS is a prevailing technique used to visualize high-dimensional data- 
sets. There are a variety of proven MDS algorithms designed for different 
analytical purposes. (Readers are directed to [Cox & Cox, 1994; MDS, 
2003] for a comprehensive overview.) Our investigation focuses on a class 
of MDS known as classical scaling [Cox & Cox, 1994]. To show the flexi- 
bility of our design approach, we also include an example in Section 6.3 us- 
ing a least-squares-based scaling technique known as Sammon Projection 
[Sammon, 1969]. 




Figure 2. A document scatterplot generated by MDS 



Given a high-dimensional dataset (a set of similar data objects repre- 
sented by numerical vectors), MDS generates a low-dimensional configura- 
tion — like a 2-D scatterplot — such that the pairwise distances between any 
points in the low-dimensional space approximate the similarities between the 
vectors that represent the points. For example. Figure 2 shows a scatterplot 
with 3,298 points (each point represents a document vector) generated by a 
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classical MDS process using the corpus described in Section 3.1. In this ex- 
ample, documents with similar themes are clustered together as annotated. 



5. ADAPTIVE VISUALIZATION USING 
STRATIFICATION 

We present an adaptive visualization technique based on data stratifica- 
tion to substantially reduce the processing time of the streams and yet main- 
tain the overall integrity of the MDS visualization output. The data ingest 
scheme that supports the technique is illustrated in Figure 3. If the primary 
data processing route (the bottom pipe) has overflowed, the data are re- 
directed to a secondary one (the middle pipe), which generates a coarser ver- 
sion of visualization but at a much faster processing rate. Also when the 
middle pipe has overflowed, too, the data go to upper red one and so on. The 
two data stratification strategies presented in this chapter are vector dimen- 
sion reduction and vector sampling. 




Figure 3. An adaptive ingest scheme for stream visualization 



5.1 Vector Dimension Reduction 

Our first stratification strategy is to cut down the dimensions of the data 
vectors. The biggest challenge is to reduce the physical size of the vectors 
but maintain most of their important contents. We accomplish this by apply- 
ing dyadic wavelets [Mallat 1998; Strang & Nguyen, 1997] to decompose 
individual vectors (and thus compress them) progressively. While the theory 
of wavelets is extensive, our experiments show that the rectangular (piece- 
wise-constant) Flaar wavelets perform well in all our visualizations. Not sur- 
prisingly, the basic Flaar also outperforms all the other wavelet candidates in 
processing time, which is considered a top priority for analyzing data 
streams [Gilbert, Kotidis, Muthukrishnan & Strauss, 2001]. 

Figure 4 shows an example of two consecutive wavelet decompositions 
on a document vector randomly selected from the demo corpus (described in 
Section 3.1.) Figure 4a is the original vector with 200 terms. Figure 4b is the 
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result of the first wavelet decomposition with 100 terms. Figure 4c is the 
result of the second wavelet decomposition with 50 terms. Because Flaar 
belongs to the dyadic wavelet family, one wavelet application will reduce 
the vector dimension by 50%. 




c 



Figure 4. a) A document vector with 200 terms, b) Result of the first wavelet decomposition 
with 100 terms, c) Result of the second decomposition with 50 terms. 

While the example in Figure 4 shows the feature-preserving property of 
wavelets on individual vectors, the next example demonstrates the accuracy 
of the resultant vectors in generating visualizations using MDS. We start 
with the same scatterplot shown in Figure 2, which is generated using a full- 
resolution matrix of the demo corpus. In order to give visual identities to the 
scatter points, we apply a K-mean [Seber, 1984] clustering process to the 2D 
scatterplot and subdivide the points into four clusters. A K-mean clustering 
process tries to minimize the sum of Euclidean distance from each data point 
to its cluster centroid. Each cluster receives a unique color (shown in Figure 
5a as magenta, cyan, grey, and yellow) for point identification. 




a 




c 



Figure 5 . Scatterplots generated by MDS using document vectors with sizes equal to a) 200, 
b) 100, and c) 50 terms. See also color plates. 



77. Visualizing data streams 



273 



Using wavelets, we then progressively reduce the dimensions of the 
document vectors from 200, to 100, and then to 50. Each reduction is fol- 
lowed by an MDS process; the visualization results are shown in Figures 5b 
and 5c. Although the orientations and spreads of the scatter points vary 
slightly in Figures 5b and 5c, major features such as clustering and separa- 
tion remain. 

5.2 Vector Sampling 

Our second stratification strategy is to reduce the number of data vectors 
based on sampling. We use a regular sampling technique to obtain an even 
data distribution in our example. Other sampling options such as a uniform 
random sampling function can also be applied. 

We first repeat the initial two steps (i.e., generate scatterplot using all 
3,298 document vectors and assign color identities to each scatter point) of 
the last example. Instead of reducing the dimensions of the vectors, this time 
we progressively reduce the number of document vectors by 50% every time 
using a regular sampling method. Each sampling process is followed by an- 
other MDS to project a scatterplot based on the sampled input. The results 
are shown in Figure 6. 




Figure 6. Scatterplots generated by MDS using a) 3298, b) 1649, and c) 824 document 
vectors. See also color plates. 



The three visualizations in Figure 6 demonstrate that even though we 
substantially reduce the number of vectors for the MDS process, the shape or 
spread of the scatterpoints remains more or less the same. This phenomenon 
can be explained by the stability of the two most important Eigenvectors 
generated by the highly related document vectors. This property helps sim- 
plify the structures of a complex dataset for visual analytics and potentially 
speeds up the time requirement for critical decision making. It also lays the 
foundation of the data fusion method discussed later in this chapter. 
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6. DATA STRATIFICATION OPTIONS AND 

RESULTS 

In this section, we discuss the results of the two stratification strategies 
and their impacts on the resultant visualizations. Our discussion focuses on 
computation performance and the flexibility in dealing with different data 
types and different scaling methods. 

6.1 Scatterplot Matrix 

To improve the visualization for comparison and evaluation, we progres- 
sively combine the two approaches and concatenate their results into a scat- 
terplot matrix. The scatterplot matrix in Figure 7 shows the consequences of 
reducing document vectors (row) versus reducing vector dimensions (col- 
umn). 

The results indicate that although the shape of the point distribution 
changes to some extent, the overall integrity of the visualizations such as 
clustering and separation remain intact. The fact that the cluster borders re- 
main clear and crisp in all nine matrix panes indicates a very positive result 
of our approach. (We will revisit the fidelity issue in Section 7.) 

6.2 Computation Performance 

Perhaps a more important goal of the stratification effort is to substan- 
tially reduce the computation time of the MDS process. While all our nu- 
merical programs are implemented locally using C++ on a SUN Ultra 10, we 
choose to use a commercial package — Mathematica 4.2 [Mathematica, 
2003] running on a Macintosh G4 with 1 GB memory — to report the compu- 
tation performances. The Eigenvalue function used in Mathematica is algo- 
rithmically compatible to those championed by Netlib [Netlib, 2003]. 

Table 1 shows the benchmark results of our study. The top row (in blue) 
shows the number of dimensions in the document vectors. The left column 
(in green) shows the number of document vectors included in the computa- 
tion. The other nine cells are computation time measured in wall clock sec- 
onds. The corresponding scatterplot of each cell is shown in Figure 7. 



Table 1. Execution times measured in wall clock seconds 





200 Dimensions 


100 Dimensions 


50 Dimensions 


All (3268) Documents 


34.90s 


9.50s 


2.62s 


'/2 (1649) Documents 


14.80s 


4.78s 


1.52s 


% (824) Documents 


8.83s 


2.58s 


0.89s 
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Vector Dimension 
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D = 100 


D = 50 
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Figure 7. A scatterplot-matrix demonstrates the impact of reducing document vectors (row) 
versus reducing vector dimensions (column) using classical scaling technique. See also color 
plates. 



6.3 Sammon Projection 

So far we have focused only on the classical MDS [Cox & Cox, 1994] in 
our investigation. To show the flexibility of our adaptive visualization tech- 
nique, we demonstrate a second scaling example using a least-square MDS 
technique known as Sammon Projection [Sammon, 1969]. 

Classical MDS treats similarity between two vectors directly as Euclid- 
ean distances whereas least-square MDS takes it as the least squares of a 
continuous monotonic function. A particular strength of a Sammon Projec- 
tion over a classical MDS projection in visualization is that the former usu- 
ally has fewer overlapping clusters due largely to its non-linear mapping ap- 
proach. 

Figure 8 shows a re-execution of Figure 7 using the Sammon Projection 
technique. Although the visualization results look very different from those 
in Figure 7 because of the preservation of higher dimensional distances, the 
impact on the stratification and their results are very much like those in Fig- 
ure 7. Most of the scatter points are able to maintain their original positions 
and orientations. The four point colors (red, green, blue, and orange), which 
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are assigned after a K-mean clustering process, clearly show the integrity of 
the visualization after substantial dimension and vector reductions. 



Vector Dimension 

D = 200 D = 100 D = 50 




Figure 8. A scatterplot matrix demonstrates the impact of reducing document vectors (row) 
versus reducing vector dimensions (eolumn) using the Sammon Projeetion technique. See also 
color plates. 



6.4 Remote Sensing Imagery 

Our adaptive visualization technique can also be used to visualize image 
streams such as hyperspectral remote sensing imagery. A major motivation 
to include all spectral bands in the image analysis is that subjects that appear 
identical in one spectral band (like visible color) may be very different from 
each other if we look into all possible spectral bands of the images. The goal 
of this example is to show that we can apply the same stratification ap- 
proaches to analyze imagery streams. 

Our next example uses the hyperspectral image set discussed in Section 
3.2. We first apply classical MDS to scaling the pixel vectors followed by a 
K-mean process to assign unique colors to eight scatter point clusters. We 
then stratify the vectors progressively and generate the MDS visualization 
after each reduction. Figure 9 shows the scatterplot matrix with results from 
different degrees of stratifications. 
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Figure 9. A scatterplot matrix demonstrates the effects of reducing pixel vectors (row) versus 
reducing vector dimensions (column) using remote sensing imagery. See also color plates. 

In addition to the close proximity among the nine scatterplots as shown in 
previous examples, this time we can use a different approach to evaluate the 
accuracy of the results. If we map the colors of individual pixels from Figure 
9 back to Figure Ic, we obtain the image shown in Figure 10. By comparing 
features in Figure Ic to those in Figure 10, we see that all nine scatterplots 
correctly identify different features of the original image and separate them 
into different clusters. 




Figure 1 0. Colors generated by the scatterplot clusters clearly identify different features of 
the images shown in Figure Ic. See also color plates. 




278 



Chapter 11 



7. SCATTERPLOT SIMILARITY MATCHING 



So far our discussion on scatterplot comparison has been based on visual 
means. Visual-based comparisons are fast and convincing. However, they 
need human attention and interpretation and thus are not practical in some 
applications. Besides, visual-based pairwise comparison is not always reli- 
able if the scatterplots do not reflect any visible patterns. For example. Fig- 
ure 1 1 shows two scatterplots filled with white noise. As we will reveal later, 
these two scatterplots are very similar. But our eyes are fooled by the lack of 
visible patterns that are required to correlate the images. 

We want to find a reliable computation method to evaluate the similarity 
between two scatterplots so that we have a metric to measure the fidelity of 
our approximations generated by stratification and later our fusion tech- 
nique. In statistics studies, the class of techniques of matching two similar 
n-D configurations and producing a measure of the match is commonly 
known as Procrustes analysis [Cox & Cox, 1994]. 








a 



b 



Figure 11. Two scatterplots filled with white noise 



7.1 Procrustes Analysis 

We implemented a Procrustes program that can match scatterplots in any 
number of dimensions. We also assume that the one-to-one correspondence 
information among the scatter points is known. While the theory behind the 
analysis is beyond the scope of this chapter, we can comfortably summarize 
our implementation in four basic steps. Given two 2-D scatterplots X and Y 
where X and Y are {n x 2) matrices, the steps to match A to F and report a 
measure of the match are: 
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1 . Translate the two scatterplots so that they both have their centroids at the 
origin — ^by subtracting each point with its mean coordinates of the scat- 
terplot. 

2. RotateXto match Fby multiplying X with {X^YY^X)^\Y^X)-\ 

3. Dilate scatter points in X by multiplying each of them with 
ir{X^YY^X)^Vir{X^X)^. 

4. The matching index between X and Y : 

1 - {tr{X^YY^X)^^Yl{tr{X^X)tr{Y^Y)}. 

The goal is to seek the isotropic dilation and the rigid translation, reflec- 
tion, and rotation required to match one scatterplot with the other. The 
matching index calculated in Step 4 ranges from zero (best) to one (worst). 

For example, we can match the scatterplot in Figure 11b to the one in 
Figure 11a by a rotation of -36 degree (Step 2) followed by a scaling of 2 
(Step 3). The matching index (Step 4) of this Procrustes analysis is 
2.21008x10'^° — ^which indicates the two scatterplots are nearly identical. 

Bear in mind that we use Procrustes analysis to measure the similarity 
between two 2-D scatterplots, not the original high-dimensional datasets. In 
other words, we merely use Procrustes analysis as a means to remove much 
of the human subjectivity when we peruse the scatterplot patterns. 

7.2 Procrustes Analysis Results 

Table 2 shows the results of Procrustes analyses that were carried out on 
the corpus scatterplots in Figure 7. The very low index values (from 0.016 to 
0.14) in Table 2 indicate that all eight scatterplots generated by stratified 
vectors are extremely similar to the full resolution one using all 3,268 vec- 
tors with all 200 dimensions. These highly accurate results and the notable 
97.5% time reduction in generating one of them (reported in Table 1) are 
strong evidence that the two demonstrated stratification approaches are vi- 
able solutions in visualizing transient data streams. 



Table 2. Matching indices between the eight document corpus scatterplots and the 
original full resolution one shown in Figure 8 





200 


100 


50 


All (3268) 


0.0 (SELF) 


0.0224058 


0.0841326 


1/2 (1649) 


0.0162034 


0.0513420 


0.1114290 


1/4 (824) 


0.0329420 


0.0620215 


0.1417580 



To further support our argument, we provide the matching results of the 
remote sensing imagery scatterplots shown in Figure 9 in Table 3. The 
matching indices listed in Table 3 are even lower than those listed in Table 
2. Even the worst case (1/4 dimension, 1/4 vectors) accomplishes an identi- 
cal matching index up to four significant figures. 
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Table 3. Matching indices between the eight remote sensing imagery scatterplots and the 
full resolution one shown in Figure 10 





169 


84 


42 


All (4096) 


0.0 (SELF) 


0.000004106 


0.0000565361 


1/2 (2048) 


0.000000279 


0.000004136 


0.0000567618 


1/4 (1024) 


0.000004299 


0.000007314 


0.0000577721 



8. INCREMENTAL VISUALIZATION USING 

FUSION 

Our first visualization technique focuses primarily on the use of stratified 
vectors, which substitute the full-resolution ones to generate fast and accu- 
rate MDS scatterplots. The stratification effort alone, however, does not 
eliminate the requirement to re-process the entire dataset whenever new 
items arrive. Our next visualization technique, which is developed based on 
a data-fusion concept, addresses the re-processing issue by projecting new 
items directly onto an existing visualization without frequently re-processing 
the entire dataset. 

8.1 Robust Eigenvectors 

In Section 5.2, we observed that the visualization subspaces spanned by 
the two dominant Eigenvectors of our demo datasets are extremely resilient 
for changes. All the follow-on examples shown in Figures 6 to 9 indicate 
only minor distortions even after a substantial amount of the input data is 
removed. 

To further explore this characteristic, we use the hyperspectral imagery 
(see Section 3.2 and Figure Ic) to study the similarity between the Eigenvec- 
tors (and the corresponding scatterplots) generated from local regions versus 
the entire dataset. To provide identities to individual pixels, we use the im- 
age representation shown in Figure 10 to represent the hyperspectral imagery 
instead of the one in Figure Ic throughout this discussion. 

Our first step is to generate a MDS scatterplot (Figure 12a) using the 
pixel vectors from the entire hyperspectral imagery. We then select and crop 
three smaller regions and generate three MDS plots (Figures 12b to 12d) us- 
ing only the pixel vectors from the corresponding regions. These local re- 
gions are selected because they contain diverse image features (i.e., purple, 
pink, and blue on left; yellow, green, and red on right; and a mixture of eve- 
rything in the middle) as reflected by the pixel colors. 
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Figure 12. Scatterplot a) (top right) is generated from the demo imagery (top left). Scatter- 
plots b) to d) (bottom left) are generated from the corresponding cropped areas. Scatterplots e) 
to g) are generated by extracting the scatter points from a) that are found in the corresponding 
cropping windows which generate scatterplots b) to d). See also color plates. 
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Our next step is to generate three more scatterplots (Figures 12e to 12g) 
using the corresponding pixel vectors found in Figures 12b to 12d. This time 
we use the Eigenvectors computed from the entire hyperspectral imagery 
(instead of the local cropped windows) to construct all three scatterplots. 
This can be done by reusing the coordinates of the selected scatterpoints 
from Figure 12a, which is constructed using Eigenvectors from the entire 
imagery. In other words, Figures 12b and 12e are generated using same pixel 
vectors, as are Figures 12c and 12f and Figures 12d and 12g. However, Fig- 
ures 12b to 12d use local Eigenvectors of the cropped regions whereas the 
ones in Figures 12e to 12g use global Eigenvectors of the entire imagery. 

The resultant scatterplots in Figures 12b to 12g show that the three corre- 
sponding pairs (i.e.. Figures 12b and 12e, Figures 12c and 12f, and Figures 
12d and 12g) closely resemble each other. This visual-based conclusion is 
consistent with the near zero Procrustes matching indices shown in Table 4, 
which imply a close similarity among the pairs. 



Table 4. Procrustes matching indices of the three scatterplot pairs shown in Figure 12 



Figures 


12b vs. 12e 


12c vs. 12f 


12d vs. 12g 


Matching Index 


0.000718745 


0.0000381942 


0.000683066 



Because the first Eigenvector is the line though the centroid of the scatter 
points along which the variance of the projections is greatest (not necessarily 
the direction of the greatest ranges or extent of the data) and the second Ei- 
genvector is orthogonal to the first one, these Eigenvectors tend to be very 
robust for changes unless a substantial amount of disparate information is 
added. The property is particularly noteworthy because of the frequently 
high similarities among neighboring data streams. This remarkable combina- 
tion becomes the foundation of our next visualization technique on data 
streams. 

8.2 Multiple Sliding Windows 

Figure 13 illustrates an overview of our incremental visualization tech- 
nique using multiple time frames, which we call sliding windows in our dis- 
cussion, in chronological order. After the Eigenvectors of a long (blue) win- 
dow are determined and a 2-D scatterplot is generated, newly arrived indi- 
vidual vectors from the short (red) window are projected directly onto the 
visualization subspace by simply computing the dot-products between the 
incoming data vectors with the Eigenvectors of the long window. So instead 
of repeatedly processing the rather expensive {0(n)) classical MDS function 
or even the faster {0(n^)) version [Morrison, Ross & Chalmers, 2002] 
whenever new information arrives, one can now obtain an almost instant 
visualization update by carrying out a simple {0(m), m = vector dimension. 
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m « n) dot-product operation to determine the point location of the new 
information in the scatterplot. 



Sliding Direction 

Lona Window ^ Short 




Figure 13. An illustration of our multiple sliding window design in visualizing data streams. 

See also color plates. 

8.2.1 Hyperspectral Imagery 

We use the same hyperspectral imagery to demonstrate the concept of our 
visualization technique. Figure 14a shows an ideal case when 100% of the 
pixel vectors are used to generate the scatterplot by MDS. In Figure 14b, 
only 75% of the pixel vectors are projected onto the scatterplot by MDS. The 
other 25% are projected by a dot-product function using the Eigenvectors of 
the first 75%. Finally, only 50% of the pixel vectors are projected by MDS. 
The other 50% are projected according to the Eigenvectors of the first 50%. 

While the scatterplots in Figures 14b to 14c look similar to the full reso- 
lution one in Figure 14a, the low Procrustes indices in Table 5 confirm that 
the three scatterplots are indeed close to each other. These near-zero match- 
ing indices also validate our argument that we can obtain a fast and accurate 
overview of the entire dataset without the requirement of re-processing the 
entire dataset. 



Table 5. Procrustes matching indices of the three scatterplots shown in Figure 14 



Figures 


14a vs. 14b 


14a vs. 14c 


Matching Index 


000123405 


0.00233882 



8.2.2 Hydroclimate Dataset 

We pre-process the hydroclimate matrix, which is defined in Section 3.3, 
similar to the other two datasets. MDS is first applied to the 6,155 hydrocli- 
mate vectors to form a scatterplot. A K-mean process is then used to sub- 
divide the 10 most important components into 10 clusters. Random colors 
are applied to each cluster for identification throughout the demonstration as 
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depicted in Figure 15a. Finally, the colors of individual vectors are mapped 
to the corresponding co-ordinates of the western US map as shown in Figure 



15b. 




Figure 14. a) The Eigenvectors of the scatterplot are computed using 100% of the pixel vec- 
tors. b) The Eigenvectors of the scatterplot are eomputed from 75% of the pixel vectors. The 
other 25% are projected onto the seatterplot by dot-product, c) The Eigenvectors of the scat- 
terplot are computed from 50% of the pixel vectors. The other 50% are projected by dot- 
product. See also color plates. 
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Similar to the previous CIR example in Figure 14, we start with a scatterplot 
generated using 100% of the vectors as shown in Figure 16a. We then use 
75% of the vectors to determine their two major Eigenvectors. The rest of 
the 25% are inserted into the scatterplot using dot-products. The result is 
shown in Figure 16b. Finally, 50% of the vectors (on the left side) are used 
to determine the Eigenvectors of the scatterplot and the rest of the vectors 
(on the right side) are inserted into the scatterplot by dot-products. 




a 



b 



Figure 15. a) A scatterplot of 6,155 liydroclimate vectors divided into 10 clusters. Each clus- 
ter is represented by a unique random color, b) Corresponding cluster colors are projected to 
the map position. See also color plates. 

Based on visual analysis. Figures 16a and 16b are almost identical. The 
orientation of the scatterplot in Figure 16c rotates slightly in counter- 
clockwise direction. Nevertheless, the shape and the integrity of the scatter- 
plot remain intact and look very similar to Figure 16a. The near-zero Pro- 
cmstes analysis indices shown in Table 6 prove that the three scatterplots are 
indeed extremely similar. 



Table 6. Procrustes matching indices of the three scatterplots shown in Figure 14 



Figures 


16a vs. 16b 


16a vs. 16c 


Matching Index 


0.00363075 


0.0103674 
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9. COMBINED VISUALIZATION TECHNIQUE 



Although the two visualization techniques presented in this chapter are 
intended to operate independently, we can combine them together and get 
the best of both worlds. The newly combined technique is capable of proc- 
essing both individual items one at a time (by data fusion) and large amounts 
of items all together (by data stratification) efficiently and effectively. 





Figure 16. a) The Eigenvectors of the scatterplot are computed using 100% of the hydorcli- 
mate vectors, b) The Eigenvectors of the scatterplot are computed from 75% of the vectors. 
The other 25% are projected onto the scatterplot by dot-product, c) The Eigenvectors of the 
scatterplot are computed from 50% of the pixel vectors. The other 50% are projected by dot- 
product. See also color plates. 
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We have used Procrustes analysis as the primary computational method 
to evaluate the errors between a full-resolution standard scatterplot (e.g., 
Figure 14a) and ones based on multiple sliding windows approach (Figures 
14b and 14c.) A noticeable enhancement to speed up the error-tracking proc- 
ess (and thus the visualization process) is to replace the full-resolution stan- 
dard scatterplot with a fast and accurate substitute like the one using strati- 
fied vectors as presented in Section 5.1. 

The results in Table 1 show that we can save up to 92% of computation 
time (from 34.9s to 2.62s) if we compress the dimensions of 3,268 vectors 
by 75%. And the results in Table 3 show that a 75%-reduced data matrix 
(dimension reduced from 169 to 42) can still be as accurate as the full- 
resolution one. Because of this faster error-checking process, we can now 
afford to carry out error estimation more frequently and thus improve the 
overall quality of the analysis. 

Although the pre-processing steps of different data streams may vary, we 
can summarize the operations of the combined visualization technique in six 
major steps as follows: 

1. When influx rate < processing rate, use MDS to reprocess the entire data- 
set when new information arrives. 

2. When influx rate > processing rate, halt the MDS process. 

3. Use the multiple sliding windows approach to update the existing scatter- 
plot with the new information. Repeat Step 3 for a predefined number of 
updates. 

4. Use the stratification approach to come up with a fast overview of the 
entire dataset. 

5. Use the stratified overview to evaluate the accumulated error generated 
by the multiple sliding windows method using Procmstes analysis. 

6. If error threshold is reached, go to Step 1. Otherwise, go to Step 3. 

10. DISCUSSION AND FUTURE WORK 

The general concept of scaling has been applied extensively to visualize 
the similarities of high-dimensional information for many years. In the case 
of corpus visualization, the combination of representing individual docu- 
ments as mathematical vectors and projecting them onto a low-dimensional 
space using certain scaling techniques has becomes the de facto approach for 
many corpora visualization packages. Unfortunately, these packages are de- 
signed to analyze static corpora only and thus cannot handle dynamic text 
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streams proficiently. The work presented in this chapter provides critical 
technology to tackle such a clear and present problem. 

Decades of MDS research and development by biologists, psychologists, 
statisticians, and now computational scientists have successfully reduced the 
time complexity of MDS from O(n^) at the beginning, to 0(n^) [Chalmers, 
1996] in 1996, and to O(nVn) [Morrison, Ross & Chalmers, 2002] in 2002. 
However, these significant results still require the reprocessing of at least a 
portion of the dataset for every update. When an individual item arrives, it is 
far better to use our data fusion method, which takes only 0(m) (m = vector 
dimension, m « n) time to obtain an instant scatterplot update. 

While our dynamic visualization approach appears to work well on text, 
image, and hyperclimate streams, we have not included other data streams in 
our study. Until we conduct a full investigation, we cannot claim our ap- 
proach is not without limitations. 

We are in the process of integrating part of our work into an ongoing sys- 
tem development effort focusing on text stream visualization. We plan to 
evaluate the performance in a real-life environment using live text streams 
and present the results in the near future. 



11. CONCLUSIONS 

We have presented two dynamic visualization techniques to analyze tran- 
sient data streams. Using a newswire corpus and a remote sensing imagery 
sequence, we demonstrate that our data stratification approach can substan- 
tially speed up the visualization process and yet maintain the high fidelity of 
the graphic results. We also show that our data fusion approach can provide 
instant updates of an MDS scatterplot without the requirement of reprocess- 
ing all the information in full resolution. All our approximation results have 
been validated by both visual and computational tests based on Procrustes 
analysis. 

For information visualization applications similar to our examples, we 
believe our two analytic concepts will play an important role in many future 
data stream analysis tool designs. And for the other applications involving 
knowledge discovery and decision making, we believe our approach will 
work well with the established non-visual approaches and complement each 
other. 
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13. EXERCISES AND PROBLEMS 

1. The term Multidimensional Scaling (MDS) covers a variety of techniques 
to generate a low-dimensional plot from a high-dimensional dataset. This 
chapter shows only two of them that perform well in our applications. 
Find yet another MDS technique and see if it works as effectively as the 
two described in Section 6. 

2. The Procmstes analysis algorithm described in Section 7.1 can handle 
two datasets (i.e., pairwise comparison) at a time. Redesign the algorithm 
so that it can compare more than two scatterplots simultaneously. 

3. Design a new visualization technique that combines both visual and non- 
visual matching results described in Section 7. 
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FOR DATA MINING AND VISUAL ANALYSIS 
OF SPATIAL DATA 
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Abstract: The rapidly expanding market for Spatial Data Mining systems and 

technologies is driven by pressure from the public sector, environmental 
agencies and industry to provide innovative solutions to a wide range of 
different problems. The main objective of the described spatial data mining 
platfom is to provide an open, highly extensible, n-tier system architecture 
based on the Java 2 Platform, Enterprise Edition (J2EE). The data mining 
functionality is distributed among (i) Java client application for visualization 
and workspace management, (ii) application server with Enterprise Java Bean 
(EJB) container for running data mining algorithms and workspace 
management, and (iii) spatial database for storing data and spatial query 
execution. In the SPIN! system visual problem solving involves displaying 
data mining results, using visual data analysis tools, and finally producing a 
solution based on linked interactive displays with different visualizations of 
various types of knowledge and data. 

Key words: Spatial data mining. Interactive visual analysis. Enterprise architecture 



1. INTRODUCTION 

Data mining is the partially automated search for hidden patterns in 
typically large and multi-dimensional databases. It draws on results in 
machine learning, statistics and database theory [Klosgen & Zytkow, 2002]. 
Data mining methods have been packaged in data mining platforms, which 
are software environments providing support for the application of one or 
more data mining algorithms. So far, data mining and Geographic 
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Information Systems (GIS) have existed as two separate technologies, each 
with its own methods, traditions and approaches to visualization and data 
analysis. Recently, the task of integrating these two technologies has become 
highly attractive [Ester, Frommelt, Kriegel & Sander, 2000; Koperski, 
Adhikary & Han, 1996; Koperski & Han, 1997] especially as various public 
and private sector organizations possessing huge databases with thematic 
and geographically referenced data began to realize the huge potential of 
information hidden there. 

In response to this demand a prototype was developed [Andrienko, 
Andrienko, Savinov & Wettschereck, 2000; Andrienko, Andrienko, Savinov, 
Voss & Wettschereck, 2001] which demonstrated the potential of combining 
data mining and GIS. The results of this initial prototype encouraged the 
development of the SPIN! system [European 1ST SPIN! project web site; 
May, 2000; May & Savinov, 2002]. The overall objective of SPIN! system 
consists in developing a spatial data mining platform by integrating state-of- 
the-art Geographic Information System (GIS) and data mining functionality 
in a closely coupled, open, and extensible system architecture. 

This chapter describes the SPIN! architecture and pays special attention 
to such features as scalability, security, multi-user access, robustness, 
platform independence and adherence to standards. It integrates Geographic 
Information System for interactive visual data exploration and Data Mining 
functionality specially adapted for spatial data. The system is built on the 
Java 2 Enterprise Edition (J2EE) architecture and particularly uses 
Enterprise Java Bean (EJB) technology for implementing remote object 
functionality. The flexibility and scalability of the J2EE platform has made it 
the platform of choice for building different multitiered enterprise 
applications so using it as a basis for a spatial data mining platform in SPIN! 
project is a natural extension. 

EJB is a server-side component architecture, which cleanly separates the 
“business logic” (the analysis tools, in our case) from server issues, shielding 
the method developers from many technicalities involved in client-server 
programming. This choice also allows us to meet typical requirements found 
in business applications such as security, scalability, and platform 
independence in a principled manner. 

The system is tightly integrated with a relational database and can serve 
as a data access and transformation tool for both spatial and non-spatial data. 
Analysis tools can be integrated either as stand-alone modules or coupled 
more tightly by distributing the analysis functionality between the database 
and the core algorithm. A spatial database is used to execute the complex 
spatial queries generated by the analysis algorithms. The final system 
integrates several data mining methods that have been adapted to the 
analysis of spatial data such as multi-relational subgroup discovery, rule 
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induction and spatial cluster analysis. The system then combines these 
methods with the rich interactive functionality of visual data exploration, 
thus offering an integrated, distributed environment for spatial data analysis. 



2. THE SYSTEM OVERVIEW 

The SPIN! spatial data mining system has a component architecture. This 
means that it provides the infrastructure and environment while all the 
system functionality is provided by separate software modules called 
components. Components can be easily added allowing for the expansion of 
system capabilities. 

Each component is developed as an independent module for solving a 
limited number of tasks. For example, there may be components for data 
access, analysis or visualization. In order to solve complex problems, 
components need to communicate with and utilize the capabilities of each 
other. For example, when an algorithm needs data to be loaded, it asks 
another component to do this task. 

To support interactions among components we developed a Common 
Connectivity Framework called CoCon — a generic library written in Java 
consisting of a number of interfaces and classes. The idea is that components 
can be connected by means of different types of connections. Currently, 
there are three connections available: visual, hierarchical and user-defined. 
Visual connections are used to link a component with its view (similar to the 
Model-View-Controller architecture). Hierarchical connections are used to 
compose parent-child relationships among components in the workspace. 
An example of this connection is the linking of a workspace folder with its 
elements. User-defined connections are the principal type used in the system. 
These connections allow for the arbitrary linking of components in the 
workspace as required by the task to be solved. Using such a coimection, we 
could visualize a data mining result on a map by connecting the two 
corresponding components. 

All components are implemented on the basis of the CoCon common 
connectivity framework what allows them to communicate within one 
workspace. It is important that components explicitly declare connectivity 
capabilities. This includes how to connect to a given component and with 
what other components a given component can work. These capabilities can 
be described either statically or dynamically. A static description consists of 
listing the necessary descriptors such as the ability of a component to accept 
an incoming connection from another component of some class. On the other 
hand, a dynamic determination can be made at run time by asking each 
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component if it can be connected. The static connectivity information is 
simple to use while the dynamic is important for the cases where 
components may change their behavior at run time. As an example for the 
need of such dynamic support, consider that an algorithm might not be 
interested in establishing or removing connections while it is processing data 
or when its requirements depend on the current parameters. 




Figure 1. The SPIN! client provides views for its components: rule base (upper right 
window), database connection (lower left window), database query and algorithm (lower right 
windows). The workspace is visualized in the form of tree view and graph view. 



The SPIN! spatial data mining system (Figure 1) provides an integrated 
environment based on component architecture. If the system is configured to 
include a certain set of components, then it will automatically update its 
main menu, tool bar and other functions. The repository of components is 
available via an Insert menu where components are broken in groups. 
Essentially the system core provides common facilities and services similar 
to those implemented in such platforms as Eclipse and NetBeans. It is also 
quite general extendable platform, which is intended for everything and for 
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nothing simultaneously because its functionality is determined by the current 
set of components. 

A workspace is a set of components and corresponding connections. A 
workspace can be stored in or retrieved from a persistent storage (currently 
file or database). Once opened, a workspace appears in two views: a tree 
view and a graph view. In the tree view, the hierarchical structure of the 
workspace is visualized where components are individual tree nodes that can 
be expanded or collapsed. User-defined connections are not visible in the 
tree view but can be edited by means of the connection editor. In the graph 
view, components are visualized as nodes of a graph while user-defined 
connections are the graph edges. 

Components can be added to the workspace by selecting them either 
from the menu or the tool bar where all repository components are shown. A 
new component appears both in the tree view (Figure 1, left top) and in the 
graph view (Figure 1, left middle). After the component has been added it 
should be connected with other relevant components. This can be done by 
means of the Connection Editor dialog where we choose the target 
component for the currently selected source component. An alternative easy 
and friendly way to establish new connections among components is through 
the workspace graph view where connections are explicitly displayed as 
arrows between nodes. In this view, connections are created by drawing 
arrows between the corresponding graph nodes designating the source and 
the target components. It is also very important that components can be 
arranged within views into visually expressive diagrams. While adding 
connections between components, the environment uses information about 
their connectivity so that connections are allowed only for components that 
can cooperate. That is, the link sticks to the target if the connection is 
possible. It is possible to move components in the graph view; their positions 
are remembered in the workspace. 

Each component has an appropriate view. In fact, the views are also 
connectable components. However, such view connections are not persistent. 
Each component can be opened in a separate window so that the user can 
access its functions. When a workspace component is opened, the system 
automatically creates a view, connects it with the model and then displays it 
in a separate internal window within the main system frame. 

Typical data mining tasks include data preprocessing, analysis and 
visualization. The SPIN! system includes Database Connection and Database 
Query components for accessing data. The Database Connection is intended 
for storing information about the database in which the data is stored. To use 
the identified database, the Database Connection component needs to be 
connected with some other component via, say, the graph view. The 
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Database Query component describes a typical database query, for example, 
how a result set is generated from the tables in the database. Essentially, this 
component is a SQL query design tool, which allows a user to describe a 
result set by choosing appropriate items such as tables, columns, restrictions, 
and functions. Since the system is intended to be used for spatial data mining 
this component includes spatial functions and predicates that are specific for 
an Oracle database. An example would be a RELATE predicate, which is 
true only if certain type of relationship between geometric objects exists. 
The Database Query component does not include information about the 
database and thus must be linked to the Database Connection component. 
Alternately, the query can be described manually in SQL. Notice also that 
neither the Database Connection nor the Database Query component work 
alone, other components must make use of them. Such encapsulation of 
functionality and use of user-defined connections to compose various 
aggregates has been one of the main design goals of the SPIN! component 
architecture. 

Any knowledge discovery task includes a data analysis step where the 
dataset obtained from preprocessing step is processed by an appropriate data 
mining algorithm. The SPIN! system currently includes several components 
that implement different approaches to mining both ordinary and spatial 
data: spatial clustering, subgroup discovery [Klosgen & May, 2002; Kldsgen 
& Zytkow, 2002], spatial association rules [Lisi & Malerba, 2002], Bayesian 
analysis, and rule induction based on finding largest empty intervals in data 
[Savinov, 2000a; Savinov, 2003]. Subgroup discovery and rule induction 
will be described in Section 4 of this chapter. 



3. THE SYSTEM ARCHITECTURE 



3.1 N-tier EJB-based Architecture 

The general SPIN! architecture is shown in Figure 2. It is an «-tier 
Client/Server-architecture based on Enterprise Java Beans for the server side 
components. A major advantage of using Enterprise Java Beans is that such 
tasks as controlling and maintaining user access rights, handling multi-user 
access, pooling of database connections, caching, handling persistency and 
transaction management are delegated to the EJB container. The architecture 
has the following major subsystems: client, application servers each with 
one or more EJB containers, one or more database servers and optional 
compute servers. 
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Figure 2. SPIN! platform architecture. The main components are a Java-based client, an 
Enterprise Java Beans Container and one or more databases serving spatial and non-spatial 
data. 

The SPIN! client is a standalone Java application. It always creates one 
server side representative in the form of session bean. The methods of the 
session bean are accessed through the corresponding remote reference via 
Java RMI or CORBA HOP protocol. The client session bean executes 
various server side tasks on behalf of the client. In particular, workspace 
objects may be loaded from or saved to its persistent state. The client is 
based on component connectivity framework, which is implemented in Java 
as connectivity library (CoCon). The idea is that the workspace consists of 
components each of which is considered a storage for a set of parameters and 
pieces of functionality such as algorithms. The system functionality is 
determined by a set of available components. 

The application server is an Enterprise Java Bean container. It manages 
the client workspace, analysis tasks, data access and persistency. There may 
be more multiple containers running simultaneously on one or more servers. 
Among other things, this means that different algorithms and alternate tasks 
can be executed on different computers under different restrictions. The 
SPIN! system uses an EJB container for making workspaces persistent in the 
database and for remote computations. For the first task the client creates a 
special session bean, which is responsible on the server side for workspace 
persistence and access. Specifically, if the client needs to load or save a 
workspace it delegates this task to this session bean. The client creates one 
remote object for each analysis task that is to be run so that data can be 
transferred directly from the database to the algorithm. After the analysis is 
finished, the result is transferred to the client for visualization. 

User data are stored in primary data storage, which is a relational 
database system accessed via JDBC protocol, which is a part of J2EE 
standard. The database can reside on the same machine as the application 
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server, it can reside on the client machine, or it can reside on a separate 
dedicated computer. Optionally, there may be one or more secondary 
databases. In addition, data can be loaded from other sources such as 
databases, ASCII files in the file system or Excel files. For remote 
computations in the application, it is important that server data be transferred 
directly into the remote algorithm bypassing the client. It is only a set of 
components (subgraph of the workspace) that is transferred between 
application server and client. In enterprise applications the amount of data 
processed may be quite large so it is very important to avoid unnecessary 
network traffic. In particular, if the data is going to be processed remotely in 
application server then it should be transferred directly from the storage to 
this computing server. This is precisely what is done in the SPIN! system 
where the workspace stores only data description. This data description is 
transferred to the application server, which has already loaded the data itself 
directly from the specified database for processing. 

3.2 Remote Algorithm Management 

The developed architecture supposes that all algorithms are executed on 
compute servers. For each running algorithm a separate session bean is 
created which implements high-level methods for controlling the algorithm 
behavior particularly starting and stopping the execution, getting and setting 
parameters, setting the data to process, and getting the result. The session 
bean then is responsible for the method’s implementation. There are several 
ways how it can be done. 

• A clean and convenient but in some cases inefficient approach is 
using Java for implementing the complete algorithm directly within 
the corresponding EJB along with loading all data via JDBC into the 
workspace. 

• A second approach divides the labor between the EJB container and 
the relational database. We have implemented a multi-relational 
spatial subgroup-mining algorithm [Klbsgen & May, 2002] that does 
most of the analysis work (especially the spatial analysis) directly in 
the database. The EJB part retrieves summary statistics, manages 
hypotheses and controls the search. 

• A third approach consists in implementing computationally intensive 
methods in native code wrapped into shared library by means of Java 
Native Interface (JNI). A rule induction algorithm based on finding 
largest empty intervals in data [Savinov, 2000a; Savinov, 2003] has 
been implemented in this way, namely, as a dynamically linked 
library the functions of which are called by the EJB algorithm. 



12. SPIN! — An enterprise architecture for data mining and visual 301 
analysis of spatial data 

• A fourth option is that the algorithm session bean directly calls an 
external executable module. This approach has been used to run 
SPADA algorithm [Lisi & Malerba, 2002]. 

• And finally other remote objects (e.g. CORBA) can be used to 
execute the task. 

The algorithm parameters are formed in the client and transferred to the 
EJB algorithm as a workspace component before the execution. In particular, 
data to be processed by the algorithm has to be specified. It is important to 
note that only a data description is specified and transferred not the complete 
data set. In other words, the EJB algorithm gets information concerning 
where and how to find the data and what kind of restrictions to use. Thus 
when the algorithm starts, the data is directly retrieved by the EJB algorithm 
rather than passing this information through the client. 

For example, assume that we need to find interesting subgroups in 
spatially referenced data [Klosgen & May, 2002]. The data is characterized 
by both thematic attributes such as population and spatial attributes such as 
proximity to a highway or the percentage of forests in the area. The data to 
be analyzed is specified in the corresponding component where we can 
choose tables, columns, and join and restriction conditions including spatial 
operators supported by the underlying database system. The algorithm 
component is connected to the data component and the subgroup pattern 
component. The algorithm component creates a remote algorithm object in 
the EJB container as a session bean and transfers all necessary components 
to it such as the data description. The remote EJB object starts computations 
while its local counterpart periodically checks the remote object state until 
the process is finished. During computations the remote object retrieves data, 
analyzes it and stores the result in the result component. Note that each client 
may start several local and remote analysis algorithms simultaneously. 
When this happens, each algorithm is created as a separate thread. Once 
interesting subgroups have been discovered and stored in a component, they 
can be visualized in a special view, which provides a list of all subgroups 
with all parameters as well as a two-dimensional chart where each subgroup 
is represented by a point according to its coverage and strength. 
Additionally, the data analyzed by the subgroup discovery data mining 
algorithm can be viewed in a geographic information system and analyzed 
by visual analysis methods. 

3.3 Workspace Management 

One task, which is very important in distributed environment, is 
workspace management. During any given session, the user loads a 
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workspace from a central store into the client begins working. When the 
work is finished, the workspace is stored back in its initial location or 
perhaps a new one. A persistent workspace can be implemented in two 
alternate ways: 

1. the whole workspace can be serialized and stored in one object like a 
local/remote file or a database record, or 

2. the workspace components and connections can be stored separately 
in different database records. 

The first approach is much simpler but it is difficult to share workspaces. 
The second approach allows us to treat workspace components as individual 
objects even within persistent storage; that is, the whole workspace graph 
structure is represented in the storage. 




Figure 3. A workspace is a graph where nodes are components and edges are connections 
between them. All workspaces are stored in a database and retrieving a workspace means 
finding its component and connection objects. The persistent workspace management 
functionality is implemented as a session bean, which manipulates two types of entity beans: 
workspace components and workspace connections. 

We have implemented both approaches and in both cases the workspace 
is represented as a special graph object: a set of its nodes (workspace 
components) and a set of its edges (workspace connections). These graphs 
can be created from existing run-time workspace objects by specifying 
constraints on its nodes and connections. For example, for loading and 
storing workspaces, view connections are ignored. The selected subgraph is 
passed to the persistence manager. If it needs to be stored as one object then 
the whole graph is serialized. Otherwise, individual node and edge objects 
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are serialized. We used XML for serialization, that is, any object state is 
represented as XML text. 

The functionality of remote workspace management is implemented by a 
special session bean. This EJB has functions for loading and storing 
workspaces. If the workspace is stored as a set of its constituents then the 
session bean uses entity beans that correspond to the workspace components. 
The state of such workspaces is stored in two tables: one for nodes and one 
for edges. There exist two classes of entity beans, which are used to 
manipulate these two tables. The workspace management architecture for 
this case is shown in Figure 3. 



4. ANALYSIS OF SPATIAL DATA 



4.1 Mining Interesting Spatial Rules 

The conventional approach to rule induction consists in finding 
anomalies by searching for intervals with surprisingly high values of 
probability distribution (the larger the interval the better) representing the 
data semantics. For instance, in association rule mining, patterns are 
generated in the form of itemsets and their interestingness is measured by 
support (the number of objects satisfying both condition and conclusion) and 
confidence (the number of objects satisfying the rule consequent among 
those satisfying the antecedent). For example, we might infer a rule where 
some item, such as high long-term illness, under certain conditions has 99% 
confidence. Implicitly this means that other (mutually exclusive) items such 
as medium and low long-term illness have a much lower probability very 
close to 0. Thus, the rule semantics can be reformulated as the 
incompatibility of some target values with items in the condition. Interesting 
rules then can be generated by finding item combinations that never occur in 
the data. The goal is still finding some kind of anomalous behavior but the 
main distinction from traditional approaches is that we are trying to find 
empty areas instead of high frequency areas in the data. A related approach 
to mining association rules based on this principle is described in [Liu, Ku & 
Hsu, 1997; Liu, Wang, Mun & Qi, 1998; Ku, Liu & Hsu, 1997; Edmonds, 
Gryz, Liang & Miller, 2001] where empty intervals among numeric 
attributes are called holes in data. The holes are found by using an algorithm 
[Orlowski, 1990; Chazelle, Drysdale & Lee, 1986] from computational 
geometry. In the SPIN! system, we apply an original rale induction 
algorithm, called Optimist [Savinov, 1999a; Savinov, 1999b; Savinov, 
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2000b], which works with finite value attributes and generates rules with one 
pass through the data set by using a method of sectioned vectors. 

Let us assume that attributes xj , X 2 , . . . , x„ take a finite number of values, 
Uj from their domains = {aii,aj 2 ,---,ai „. } . All combinations of values 

a) — (xi,X 2 ,...,x„)eQ — AixA 2 X...xA^ form the state space or universe 
of discourse. Each record from a data set corresponds to one combination of 
attribute values or a point. If, for a combination of values, a record in the 
data set exists then it is said to be possible. Otherwise, the point is 
impossible. To represent the data semantics as Boolean distribution over the 
universe of discourse we use the method of sectioned vectors and matrices 
[Savinov, 1999b; Savinov, 2000b]. The main idea of the method is that one 
vector can represent a multidimensional interval of possible or impossible 
points (called also positive and negative internal, respectively). Each vector 
consists of Os and Is that are grouped into sections separated by dots and 
corresponding to all attributes. A section consists of n, components 
corresponding to all attribute values. For example, 01.010.0101 is a 
sectioned vector for three attributes taking 2, 3 and 4 values. Each 
component corresponds to one attribute value so that the number of 
components in the vector is equal to the total number of attribute values. A 
sectioned vector associates n components with each point from Q (one from 
each section). The position of these components in the vector corresponds to 
the point coordinates. To represent negative intervals we use a disjunctive 
interpretation of sectioned vector. This means that the point is assigned 0 if 
all its components in the vector are Os, and it is assigned 1 if at least one the 
components is 1. For example, the point {aii,a 2 \,a 2 \) is impossible 
according to the semantics of vector 01.010.0101 because all three 
components corresponding to its coordinates in the vector 
^1 1^12'^21^22^23‘^3 1^32^33 zeros. Yet the point (r?i 1 ,^ 22 ’^ 33 ) 
possible because the component corresponding to «22 1 vector 

Ol.OiO.OlOl. 

The main idea behind the algorithm for finding largest empty intervals 
consists in representing data semantics by a set of negative sectioned vectors 
and updating it for each record. Initially the data is represented only by the 
empty interval consisting of all Os and making all points impossible. After 
the first record is added it is split into several smaller negative intervals so 
that the point corresponding to this record becomes possible. For example, 
addition of the record Ol.OOi.OOOl (where Is correspond to its values) to the 
interval 00.010.0100 splits it into three new intervals: 01.010.0100, 
OO.OILOIOO, and 00.010. OlOf (changed components are underlined). During 
this procedure very small intervals with a lot of Is are removed since they 
generate very specific rules leaving only the top set of the largest intervals. 
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Once largest empty intervals have been found, they can be easily 
transformed into rules by negating sections that should be in the antecedent. 
For example, the vector {0,1} v {0,1,0} v {0,1,0,!} can be transformed into 
the implication {1,0} a {1,0,1} ^ {0,1,0,!} interpreted as the rule IF 
and X 2 - {a 2 \,a 22 ,} TFIEN xt, - {a 22 , 02 f\ ■ The rules are 
filled in by statistical information in the form of the target value frequencies 
within the rule condition interval (for one additional pass through the data 
set). In other words, each value in conclusion is assigned its frequency 
within the condition interval, for example, IF xj - {an) AND 
X 2 - {^21’ ^23 } then X 3 = {032 : 145,034 : 178}, which is obviously more 
expressive. Here 145 means that the value 033 occurs 145 times within the 
selected interval. 




Figure 4. Visualization of spatial rules simultaneously and interactively with the map and 
other views in the SPIN! system. As one rule is selected in the upper right view all objects 
satisfying its condition are dynamically highlighted on the map in the lower right window. 



The Optimist algorithm has been implemented as one of the SPIN! 
spatial data mining system components (Figure 4). It is tuned by a set of 
algorithm parameters such as maximal number of patterns (empty intervals) 
and executed either on the client or on the server. Input data for the 
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algorithm is specified by a standard SPIN! query component, which uses a 
separate connection component to access a database. The spatial rules 
generated by the algorithm are stored in rule base component. When 
appropriately connected, this minimal set of components implements the 
conventional knowledge discovery cycle. The analysis starts from specifying 
database and query, which can produce data in the necessary format. In our 
case, we need data columns to take only a finite number of values. Since 
most of the source data had continuous attributes we applied the SPIN! 
optimal discretization algorithm [Andrienko, Andrienko & Savinov, 2001]. 
Once columns had been discretized it was necessary to generate spatial 
attributes. For this purpose we used the spatial functionality of the Oracle 9i 
database where objects are represented by means of special built-in 
geometry type. Using such a representation, a query can combine spatial 
information with thematic data describing objects located in space. It is 
important that various spatial properties can be automatically generated by 
the database with the help of spatial predicates and relationships. 

We used UK 1991 census data for Stockport, one of the ten districts in 
Greater Manchester, UK. The analysis was carried out at the level of 
enumeration districts (the lowest level of aggregation) characterized by such 
attributes as person per household, cars per household, migration, long-term 
illness, unemployment and other census statistics. Spatial information was 
available as coordinates and borders of objects such as enumeration districts, 
water, roads, streets, railways, and bus stops. For a typical analysis, we 
might be interested in finding dependencies among different thematic and 
spatial attributes, for example, what spatial and non-spatial factors influence 
long-term illness. As a spatial characteristic we define an attribute that 
counts the number of water resources belonging to each enumeration district 
calculated by means of SQL statement with spatial join. The final result set 
produced by SQL query is a normal table, which can be directly analyzed by 
the Optimist rule induction algorithm. The following example has been 
generated by such an analysis where MARRIED is the percentage of married 
people and WATER NUM REL is a characteristic of water resources in the 
enumeration district: 

IF MARRIED (461) {high (46%) OR medium (53%)} 

AND WATER_NUM_REL (447) {low (58%) OR medium (41%)} 

THEN LONG TERM ILLNESS (358) {low (68%) OR medium (31%)} 

The produced rules can be shown in their own window where the rules 
can be studied in detail. However, the SPIN! system provides a much more 
powerful method by using linked displays and interactive visualization 
[Andrienko et al., 2001]. The idea is that objects described in one view can 
be simultaneously visualized in other views. In our case, the rules describe 
enumeration districts and these very districts can be simultaneously shown 
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on a map. Moreover, if we select a certain rule, all objects that satisfy its 
left-hand side are dynamically highlighted on the map so that we can easily 
see how they are spatially distributed (Figure 4). For example, we might find 
that enumeration districts satisfying a certain rule and thus having interesting 
characteristics in terms of the target attribute form a cluster or have a more 
complex spatial configuration with respect to other geographic objects such 
as roads and cities. 

4.2 Spatial Subgroup Mining 

In this section, we focus on spatial patterns from the perspective of the 
subgroup-mining paradigm. Subgroup Mining [Klosgen, 1991, 1996, 2002] 
is used to analyze dependencies between a target variable and a large 
number of explanatory variables. Interesting subgroups are searched that 
show some type of deviation, for example, subgroups with an over 
proportionally high target share for a value of a discrete target variable, or a 
high mean for a continuous target variable. An advanced subgroup-mining 
algorithm has been implemented in the SPIN! system as one of its 
components. It supports multirelational hypotheses, efficient data base 
integration, discovery of causal subgroup structures, and visualization-based 
interaction options. The goal is to provide a spatial mining tool applicable in 
a wide range of circumstances. 

Spatial subgroups are represented using an object-relational query 
language by embedding part of the search algorithm in a spatial database 
(SDBS). Thus the data mining and the visualization in a GIS share the same 
data. While this approach embraces the full complexity and richness of the 
spatial domain, most approaches to Spatial Data Mining export and pre- 
process the data from a SDBS. Our approach results in significant 
improvements for all stages of the knowledge discovery cycle: 

1. Data Access: Subgroup Mining is partially embedded in a spatial 
database, where analysis is performed. No data transformation is 
necessary and the same data is used for analysis and mapping in a 
GIS. This is important for the applicability of the system since pre- 
processing of spatial data is error-prone and complex. 

2. Pre-processing and analysis: SubgroupMiner handles both numeric 
and nominal target attributes. For numeric explanatory variables on- 
the-fly discretization is performed. Spatial and non-spatial joins are 
executed dynamically. 

3. Post-processing and Interpretation: Similar subgroups are clustered 
according to degree of overlap of instances to identify 
multicollinearities. A Bayesian network between subgroups can be 
inferred to support causal analysis. 
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4. Visualization: SubgroupMiner is dynamically linked to a GIS, so that 
spatial subgroups are visualized on a map. This allows the user to 
bring in background knowledge into the exploratory process, to 
perform several forms of interactive sensitivity analysis and to 
explore the relation to further variables and spatial features. 

Subgroups are subsets of analysis objects described by selection 
expressions of a query language such as simple conjunctional attributive 
selections, or multirelational selections joining several tables. Spatial 
subgroups are described by a spatial query language that includes operations 
on the spatial references of objects. For instance, a spatial subgroup may 
consist of the enumeration districts of Manchester that are intersected by a 
certain river. A spatial predicate {intersects) operates on the coordinates of 
the spatially referenced objects enumeration districts and rivers. 

Subgroups are described via a hypothesis language. The domain is an 
object-relational database schema S = {i?j, ..., Rn) where each Ri can have at 
most one geometry attribute G/. Multirelational subgroups are represented 
by a concept set C = {C,}, where each C, consists of a set of conjunctive 
attribute-value-pairs {GMi=vi,..., C,M„=v„} from a relation in S, a set of 
links L={L^ between two concepts Cj, Q in C via their attributes A^, 
where the link has the form Ci.Am 6, Ck-A^ , and 0 can be ‘=’, a distance, or 
topological predicate {disjoint, meet, equal, inside, contains, covered by, 
covers, overlap, or interacts). 

For example, the subgroup “districts with high rate of migration and 
unemployment crossed by the M60” can be represented as 
C= { {district.migration=high,district.unemplyoment=high} , 

{road.name= ‘M60’}} 

L= {{ spatially_interact(district.geometry, road.geometry)}} 

Existential quantifiers of the links are problematic when many objects are 
linked, for example, many persons living in a city or many measurements of 
a person. The condition that one of these objects has a special value 
combination will often not result in a useful subgroup. In this case, 
conditions based on aggregates such as counts, shares or averages will be 
more useful [Knobbe, de Haas & Siebes, 2001; Krogel & Wrobel, 2001]. 

These aggregation conditions are included by aggregation operations 
{avg, count, share, min, max, sum) for an attribute of a selector. An average 
operation on a numerical attribute additionally needs labeled intervals to be 
specified. 

C = (district.migration = high; building.count(id) = high) 

L = (spatially_interact(district.geometry, building.geometry)) 

Extension: Districts with many buildings. 

For buildings. sum the labels low, normal, high and their intervals are 
specified. 
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Multirelational subgroups were described in [Wrobel, 1997] in an 
Inductive Logic Programming (ILP) setting. Our hypothesis language is 
more powerful due to numeric target variables, aggregations, and spatial 
links. Moreover, all combinations of numeric and nominal variables in the 
independent and dependent variables are permitted in the problem 
description (Table 1). Numeric independent variables are discretized on the 
fly. This increases applicability of subgroup mining. 



Table 1. All combinations of numeric and nominal variables are permitted 





Numeric Target 


Nominal Target 


Numeric input variables 


Yes 


Yes 


Nominal input variables 


Yes 


Yes 


Mixed numeric/nominal 


Yes 


Yes 



Our approach is based on an object-relational representation. The 
formulation of queries depends on non-atomic data-types for the geometry, 
spatial operators based on computational geometry, grouping and 
aggregation. None of these features is present in basic relational algebra or 
Datalog. An interesting theoretical framework for the study of spatial 
databases are constraint databases [Kuper, Libkin & Paredaens, 2000], 
which can be formulated as (non-trivial) extensions of relational algebra or 
Datalog. However, using SQL is more direct and much more practical for 
our purposes. The price to pay is that SQL extended by object-relational 
features is less amenable to theoretical analysis (but see [Libkin, 2001]). For 
calculating spatial relationships, spatial extensions of DBMS-like Oracle 
Spatial can be used. 

For database integration, it is necessary to express a multirelational 
subgroup as defined above as a query of a database system. The result of the 
query is a table representing the extension of the subgroup description. One 
part of this query defines the subset of the product space according to the / 
concepts and l-\ link conditions. The from part includes the / (not necessarily 
different) tables and the where part includes the /-I link conditions (they are 
given as strings or default options in the link specification; spatial extensions 
of SQL apply a special syntax for the spatial operations). Additionally, the 
where part includes the conditions associated to the definition of selectors of 
concepts. Then the aggregation conditions are applied and finally the product 
space is projected to the target table (using the DISTINCT feature of SQL). 

The complexity of the SQL statement is low for a single relational 
subgroup. Only the attributive selectors must be included in the where part 
of the query. For multirelational subgroups without aggregates and no 
distinction of multiple instances, the from part must manage possible 
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duplicate uses of tables, and the where part includes the link conditions 
(transformed from the link specification) and the attributive selectors. For 
aggregation queries, a nested two-level select statement is necessary, first 
constructing the multirelational attributive part and then generating the 
aggregations. Multiple instances of objects of one table are treated by 
including the table in the from part several times and the distinction 
predicate in the where part. 

The space of subgroups to be explored within a search depends on the 
specification of a relation graph, which includes tables (object classes) and 
links. For spatial links the system can automatically identify geometry 
attributes by which spatial objects are linked, since there is at most one such 
attribute. A relation graph constrains the multi-relational hypothesis space in 
a similar way as attribute selection constrains it for single relations. 

The search for interesting subgroups is arranged as an iterated general to 
specific, generate and test procedure. In each iteration, a number of parent 
subgroups is expanded in all possible ways, the resulting specialized 
subgroups are evaluated, and the subgroups are selected that are used as 
parent subgroups for the next iteration step, until a pre-specified iteration 
depth is achieved or no further significant subgroup can be found. There is a 
natural partial ordering of subgroup descriptions. According to the partial 
ordering, a specialization of a subgroup either includes a further selector to 
any of the concepts of the description or introduces an additional link to a 
further table. 

The statistical significance of a subgroup is evaluated by a quality 
function. As a standard quality function, SubgroupMiner uses the classical 
binomial test to verify if the target share is significantly different in a 
subgroup: 



^Jp«{^-p») V N-n 

This z-score quality function based on comparing the target group share 
in the subgroup (p) with the share in its complementary subset balances four 
criteria: size of subgroup (n), relative size of subgroup with respect to total 
population size (N), difference of the target shares (p-po), and the level of the 
target share in the total population (po). The quality function is symmetric 
with respect to the complementary subgroup. It is equivalent to the %^-test of 
dependence between subgroup S and target group T, and the correlation 
coefficient for the (binary) subgroup and target group variables. For 
continuous target variables and the deviating mean pattern, the quality 
function is similar, using mean and variance instead of share p and binary 
case variance />o(l-po)- 
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To evaluate a subgroup description, a contingency table is statistically 
analyzed (Table 2). It is computed for the extension of the subgroup 
description in the target object class. To get these numbers, a multirelational 
query is forwarded to the database. Contingency tables must be calculated in 
an efficient way for the very many subgroups evaluated during a search task. 



Table 2. Contingency table for target migration=high vs. unemployment=high 





Target 




migration = high 


— imigration = high 




Subgroup 


unemployment=high, 


16 


19 


35 




— iunemployment=high 


47 


496 


543 






63 


515 


578 



We use a two-layer implementation, where evaluation of contingency 
tables is done in SQL, while the search manager is implemented in Java. A 
sufficient statistics approach is applied by which a single query provides the 
aggregates that are sufficient to evaluate the whole bunch of successor 
subgroups. In the data server layer, within one pass over the database all 
contingency tables are calculated that are needed for a next search level. 
Thus not each single hypothesis queries the database, but a (next) population 
of hypotheses is treated concurrently to optimize data access and aggregation 
needed by these hypotheses. The search manager receives only aggregated 
data from the database so that network traffic is reduced. Besides offering 
scaling potential, such an approach includes the advantage of development 
ease, portability, and parallelization possibilities. 

The central component of the query is the selection of the multirelational 
parent subgroup. This is why representation of multirelational spatial 
subgroup in SQL is required. To generate the aggregations (cross tables) for 
a parent subgroup, a nested select-expression is applied for multirelational 
parents. From the product table, first the expansion attribute(s), key-attribute 
for the primary table and target attribute are projected and aggregates 
calculated for the projection. Then the cross tables (target versus expansion 
attribute) are calculated. Efficient calculation of several cross tables, 
however, is difficult in SQL-implementations. An obvious solution could be 
based on building the union of several group-by operations (of target and 
expansion attributes). Although, in principle, several parallel aggregations 
could be calculated in one scan over the database, this is not optimised in 
SQL implementations. Indeed each union operation unnecessarily performs 
an own scan over the database. Therefore, to achieve a scalable 
implementation (at least for single relational and some subtypes of 
multirelational or spatial applications), the group-by operation has been 
replaced by explicit sum operations including case statements combining the 
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different value combinations. Thus for each parent, only one scan over the 
database (or one joined product table) is executed. Further optimisations are 
achieved by combining those parents that are in the same joined product 
space (to eliminate uimecessary duplicate joins). 

As in our prior example, we applied this algorithm within the SPIN! 
system to UK 1991 census data for Stockport, one of the ten districts in 
Greater Manchester, UK. Census data provide aggregated information on 
demographic attributes such as person per household, cars per household, 
unemployment, migration, and long-term-illness. Their lowest level of 
aggregation is so called enumeration districts. Also available are detailed 
geographical layers, among them streets, rivers, buildings, railway lines, 
shopping areas. Data are provided to the project by the project partners 
Manchester University and Manchester Metropolitan University. 

Assume we are interested in enumeration districts with a high migration 
rate. We want to find out how those enumeration districts are characterized, 
and especially what distinguishes them from other enumeration districts not 
having a high migration rate. Spatial subgroup discovery helps to answer this 
question by searching the hypothesis space for interesting deviation patterns 
with respect to the target attribute. 




Figure 5. Overview on subgroups found showing the subgroup description (left). Bottom right 
side shows a detail view for the overlap of the concept C (e.g. located near a railway line) and 
the target attribute T (high unemployment rate). The window on the right top plots p(fiC) 
against p(C) for the subgroup selected on the left and shows isolines as theoretically discussed 
in [Klosgen, 1996]. 
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The target attribute T is then high migration rate. A concept C found in 
the search is Enumeration districts with high unemployment crossed by a 
railway line. Note that this subgroup combines spatial and non-spatial 
features. The deviation pattern is that the proportion of districts satisfying 
the target T is higher in districts that satisfy pattern C than in the overall 
population (p(T | C)>p(J)). In other words if a district is characterized by 
high unemployment (census data) and at the same time is crossed by a 
railway line (geographic data) then the migration rate is expected to be 
higher than normally. 

Another — this time purely spatial — subgroup found is Enumeration 
district crossed by motorway M60. This spatial subgroup induces a 
homogenous cluster taking the form of a physical spatial object. Spatial 
objects can often act as causal proxies for causally relevant attributes not 
part of the search space. 

A third — this time non-spatial — subgroup found is Enumeration districts 
with low rate of households with 2 cars and low rate of married people. By 
spotting the subgroup on the map we note that is a spatially inhomogeneous 
group, but with its center of gravity directed towards the center of Stockport. 




Figure 6. Enumeration districts satisfying the subgroup description C (high unemployment 
rate and crossed by a railway line) are highlighted with a thicker black line. Enumeration 
districts also satisfying the target (high migration rate) are displayed in a lighter color. 
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The way data mining results are presented to the user is essential for their 
appropriate interpretation. We use a combination of cartographic and non- 
cartographic displays linked together through simultaneous dynamic 
highlighting of the corresponding parts. The user navigates in the list of 
subgroups (Figure 5), which are dynamically highlighted in the map window 
(Figure 6). As a mapping tool, the SPIN! -platform integrates the 
CommonGIS system [Andrienko & Andrienko, 1999], whose strengths lies 
in the dynamic manipulation of spatial statistical data. Figure 6 shows an 
example for the migrant scenario, where the subgroup discovery method 
reports a relation between districts with high migration rate and high- 
unemployment. 

Such an analysis, where results of data mining and interactive data 
analysis are visualized simultaneously, can be used to make non-trivial 
decisions in very different and diverse application areas including decision 
making that takes place in public and private sector organizations. In 
particular, this approach offers a great potential to improve decisions made 
by statistical analysts, urban planners, environmental decision makers, 
people in geomarketing, the management of natural and industrial hazards, 
nuclear safety and radiation protection and many other domains. Combining 
the strengths of GIS and Data Mining in a Spatial Mining tool helps the 
decision maker to back up her intuitive insights by sound statistics, and to 
automatically explore patterns in the data that are invisible to the eye 
because they live in high-dimensional spaces. 



5. CONCLUSION 

We have described the general architecture of the SPIN! spatial data 
mining platform. It integrates GIS and data mining algorithms that have been 
adapted to spatial data. The choice of J2EE technology allows us to meet 
requirements such as security, scalability, platform independence, in a 
principled manner. The system is tightly integrated with a RDBMS and can 
serve as data access and transformation tool for spatial and non-spatial data. 
The client has been implemented in Java using Swing components for its 
visual interface. Jboss 3.0 has been used as an application server [JBoss 
Application Server web site]. An Oracle 9i database has been used for spatial 
data and workspace storage. In the future it would be very interesting to add 
the following features to this architecture: persistent algorithms running with 
no client, a web interface to data mining algorithms via a conventional 
browser, data mining functionality as web services via XML-based SOAP 
protocol, and shared workspaces where components can belong to more than 
one workspace. 
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7. EXERCISES AND PROBLEMS 

1 . Assume a dataset includes a set of wards (voting districts) represented by 
polygons and a set of roads represented by lines. What numeric attributes 
could be generated from these geographic data characterizing each dis- 
trict? How can these attributes be generated by using the functions of an 
object-relational database? 

2. Assume we have found a subgroup of wards with an above-average value 
for the attribute low_social {low_social=high), a below-average value for 
the percentage of married people (married=low) and an above-average 
value for unemployed men (unempl_male=high). This subgroup is char- 
acterized by an average value for the Carstairs deprivation index of 6.24 
compared to the overall average of 0.94. What conclusions can be drawn 
from this subgroup by such authorities as departments of national and lo- 
cal government or providers of health and education services? 

3. Under the assumptions of the previous exercise, suppose that we have 
additional geographic data such as roads, cities, rivers, railway lines, bus 
stops. These objects can be visualized on the map where the Carstairs 
deprivation index is represented by color for each ward. In addition, sup- 
pose that wards from the selected interesting subgroup are highlighted. 
What kind of interesting relationships could be visually discovered? 

4. Suppose that by analyzing census data we found that such characteristics 
as high long-term illness (LONG_TERN_ILLNESS=High), high person 
per household (PERSON_PER_HHOLD=High) and low relative density 
of water resources (WATERDET_NUM_REL=Low) are incompatible, 
i.e., there exist no ward with such characteristics. What rule could be 
generated from this information? 
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Abstract: The Predictive Model Markup Language (PMML) is an XML-based industrial 

standard for the platform- and system-independent representation of data 
mining models. It is currently supported by a number of knowledge discovery 
systems. The primary purpose of the PMML standard is to separate model 
generation from model storage in order to enable users to view, post-process, 
and utilize data mining models independently of the tool that generated the 
model. In this chapter, a short introduction to PMML is followed by the 
presentation of VizWiz. VizWiz is a tool for the visualization and evaluation 
of data mining models that are specified in PMML. This tool allows for the 
highly interactive visual exploration of a variety of data mining result types 
such as decision trees, classification and association rules or subgroups. A 
noteworthy contribution of this work is that most of these result types can be 
presented to the user in the same manner, thus reducing the learning rate for 
the user and removing some of the jargon that often prevents application 
experts from using knowledge discovery tools. 



Key words: Data Mining, Knowledge Discovery, PMML, XML, Visualization 



1. INTRODUCTION 

The Predictive Model Markup Language (PMML) aims at defining a 
common representational language for data mining models. Data mining 
models are typically the results of the modeling phase of the knowledge 
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discovery process.' These results (a.k.a. models) are then evaluated and, if 
found valuable by the user, incorporated into operational systems via SQL 
statements or C-programs. The focus of this chapter is the phase between the 
generation of the model and its deployment. During this phase, typically 
numerous models are evaluated visually and experimentally, most are 
discarded, some are modified (manually or automatically) and then re- 
evaluated and, finally, very few models are deployed. This phase can be seen 
as an exploratory phase with the difference that in the field of information 
visualization data are typically explored while in this case models are 
explored. 

Shneiderman [Shneiderman, 2002] makes four recommendations 
regarding the development of discovery tools and the thesis of this chapter is 
that the tool, called VizWiz, that is described below follows these 
recommendations : 

1. Integrate data mining and information visualization; VizWiz is 
not a data mining tool, but rather an information visualization tool. 
However, in a certain sense it does combine these two techniques 
since it visualizes data mining results and as such uses information 
visualization techniques as a post-processing mechanism for data 
mining. 

2. Allow users to speeify what they are seeking and what they find 
interesting: The highly interactive graphical user interface of VizWiz 
enables users to quickly zoom in on those (parts of) models that are 
most interesting to them. Overview plots showing multiple models 
can be utilized to identify those models best suited for the purpose. 
For example, a user may prefer models with high coverage over 
extremely accurate models or vice versa. 

3. Support collaboration; The input and outpufi format of VizWiz is an 
XML-based standard that is already supported by a variety of other 
tools such as IBM’s Intelligent Miner or Clementine from SPSS. 
Users working jointly on a specific analysis problem, for example in a 
medical application of high public interest, can easily exchange their 
(preliminary) models. Furthermore, analysis experts using these rather 
complex systems can utilize VizWiz to present their results to 
application experts that may not have access to these complex 
knowledge discovery tools. Finally, Java technology in VizWiz 
allows for the presentation of data mining models in the Internet thus 
enabling wide dissemination of interesting findings. 



' The CRISP-DM process model [Chapman et al., 2000] divides the knowledge discovery 
process into six phases: business understanding, data understanding, data preparation, 
modeling, evaluation, and deployment. 

^ VizWiz can write modified or enhanced PMML files. 
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4. Respect human responsibility: VizWiz is a powerful post- 
processing tool that combines visualization, evaluation and editing 
features in order to give the human user maximal influence on the 
final outcome of the analysis. 

VizWiz was therefore developed as a highly interactive tool for the 
visualization and evaluation of data mining results. This tool utilizes 
information visualization techniques to enable application experts who are 
not necessarily experts in knowledge discovery, to explore, evaluate and 
select those data mining results that are most suited for their purpose. The 
primary arguments for this enabling technology are that the number and 
complexity of data mining methods is significantly higher than the number 
of distinct model types and that the model generation process is much more 
complex than the model understanding process, especially when models are 
properly visualized. 

This chapter is organized as follows: Section 2 provides a short 
introduction to PMML with the aid of an example. The relevance of PMML 
for visualization tools is highlighted in Section 3 with the introduction of 
VizWiz. Related work is discussed in Section 4. The chapter concludes with 
a discussion on the (potential) impact of PMML and VizWiz in Section 5. 



2. THE PREDICTIVE MODEL MARKUP 
LANGUAGE 

The Predictive Model Markup Language (PMML) is an XML mark up 
language that can be used to describe statistical and data mining models. The 
most recent version (2.1) has been developed by the Data Mining Group 
[Data Mining Group, 2003], a group of more than 20 vendors of data mining 
software packages. Prominent members of the group are for example IBM, 
SPSS, SAS, and NCR. A large variety of model types is already supported 
(decision and regression trees, neural networks, clustering, regression, naive 
Bayes, and association and sequence rules), further developments are under 
way. 

Figure 1 lists the XML source code for a simple decision tree. Detailed 
documentation on the syntax of PMML can be downloaded from the DMG 
web site [Data Mining Group, 2003] and shall not be the subject of this 
chapter. The PMML example shown in Figure 1 contains no information 
about the performance of this decision tree on any compatible data set. It 
simply provides for each node, what class this node would predict. A 
visualization of this decision tree is shown in Figure 2. 
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PMML separates model generation from model visualization and 
evaluation which should lead to simpler, cheaper, and more robust tools for 
producing data mining models, and for their subsequent processing. For the 
purposes of this chapter, PMML should be seen as the vehicle that bridges 
the gap between data mining tools and (information) visualization. The 
potential of this bridge is that advanced visualization techniques can be 
applied for the visualization of data mining results without requiring the 
developers of such techniques to incorporate their software into any specific 
data mining tool. 

<?xml version= " 1 . 0 " ?> <PMML version= " 2 . 1 " > 

<Header copyright= "www. dmg . org" description= "A very small 
binary tree model to show structure . "/> 

<DataDictionary numberOf Fields= " 5 " > 

<DataField name="temp." optype= "continuous "/> 

<DataField name= "humidity " optype="continuous " /> 

<DataField name= "windy" optype=" categorical" > 

<Value value="true"/> <Value value= " false" /> </DataField> 
<DataField name="outlook" optype=" categorical" > 

<Value value= " sunny" /> <Value value= "overcast " /> 

<Value value="rain"/> </DataField> 

<DataField name =" what I do" optype=" categorical" > 

<Value value="will play"/> <Value value="may play"/> 

<Value value="no play"/> </DataField> 

</DataDict ionary > 

<TreeModel modelName= "golfing" functionName= " classification" > 
<MiningSchema> 

<MiningField name=" temp . " /> <MiningField name="humidity"/> 
<MiningField name="windy"/> <MiningField name="outlook"/> 
<MiningField name =" what I do" usageType="predicted"/> 
</MiningSchema> 

<Node score="will play"> <True/> 

<Node score="will play"> 

<SimplePredicate f ield="outlook" operator=" equal" 
value= " sunny" / > 

<Node score="will play"> 

<CompoundPredicate booleanOperator="and" > 

<SimplePredicate f ield= " temp . " operator= " lessThan" 
value= " 90 " / > 

<SimplePredicate f ield= " temp . " operator= "greaterThan" 
value= "50 " /> 

</CompoundPredicate> 

<Node score="will play" > 

<SimplePredicate f ield="humidity" operator= " lessThan" 
value= " 80 " /></Node> 

<Node score="no play" > 

<SimplePredicate f ield= "humidity" 

operator= "greaterOrEqual " value="80"/> </Node> </Node> 
<Node score="no play" > 

<CompoundPredicate booleanOperator="or" > 
<SimplePredicate f ield= " temp . " 
operator= "greaterOrEqual " value= " 90 " /> 
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<SimplePredicate f ield="temp . " operator= " lessOrEqual " 
value= " 50 " /> 

</CompoundPredicate> </Node> </Node> 

<Node score="may play" > 

<CompoundPredicate booleanOperator="or" > 

<SimplePredicate f ield="outlook" operator= " equal " 
value= "overcast " /> 

<SimplePredicate f ield="outlook" operator=" equal " 
value="rain" /> 

< / CompoundPredi cat e > 

<Node score="may play" > 

<CompoundPredicate booleanOperator="and" > 
<SimplePredicate f ield= " temp . " 
operator= "greaterThan" value= " 60 " /> 

<SimplePredicate f ield= " temp . " operator= " lessThan" 
value= " 100 " /> 

<SimplePredicate field="outlook" operator=" equal" 
value=" overcast" /> 

<SimplePredicate f ield="humidity" operator= " lessThan" 
value="70" /> 

<SimplePredicate f ield="windy" operator= "equal " 
value="false" /> 

</CompoundPredicate> </Node> 

<Node score="no play" > 

<CompoundPredicate booleanOperator="and" > 

<SimplePredicate f ield="outlook" operator= "equal " 
value="rain" /> 

<SimplePredicate f ield="humidity" operator= " lessThan" 
value="70" /> 

</CompoundPredicate> 

< /Nodex /Node></Node></TreeModelx/PMML> 

Figure 1. A PMML example defining a decision tree for the “play/don’t play” classification 
task first defined in [Quinlan, 1993], This example has been taken from the DMG web site 
[Data Mining Group, 2003], 

In the field of information visualization, PMML should motivate 
researchers to advance the state of the art in model visualization as opposed 
to the visualization of (abstracted) data that is typically the subject of this 
field. The primary difference is that data have already been abstracted by 
data mining, but in most cases, further abstraction will be needed to support 
users in picking out the right model(s) for their tasks at hand. A further 
relevant aspect of PMML is that the XML code representing data mining 
models can be extended with visualization specific information in order to 
store user preferences and/or information that is relevant for user adaptation. 
Especially the use of XML style sheets can support different visualization 
modes for different user groups. For example, technical users may be shown 
more enriched visualizations while business users may be presented with 
standard business charts that show less detail. 
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Figure 2. VizWiz visualization of a decision tree, based on the DMG example shown in 
Figure 1. The bars on top of each node indicate the class predicted by this node and the text in 
each node lists the conditions that must be satisfied to reach this node. 



3. VIZWIZ: INTERACTIVE VISUALIZATION AND 
EVALUATION 

VizWiz is a tool for the visualization and evaluation of data mining 
models. It is written in Java and can be mn as an Applet with the Java 
Runtime Environment, version 1.4. VizWiz reads and writes modified 
models in PMML format. The graphical user interface of VizWiz offers two 
primary viewing options to the user: one can either view a plot showing the 
relative performance of each model on a Receiver Operator Characteristics 
(ROC) [Provost & Fawcett, 2001] plot (Figure 3) or one can view a detailed 
graphical rendering of each model (Figure 4). The first version can thus 
serve as an overview window that supports the user in quickly zooming in on 
those models that are most interesting to him/her. The second viewing option 
can be used to learn more about each model, to edit the model and to test the 
model on selected or all data records of a given test data set. VizWiz 
currently offers interactive visualizations for the following model types: 

• Linear regression 

• Decision- and regression trees 

• Association rules 

• Propositional and first-order rules^ 

• Subgroups 

The last two model types (mles and subgroups) are not explicitly 
supported by PMML and were therefore encoded in the format of multi- 
variate decision trees with minor extensions. The output of a multitude of 

^ Propositional rules are typically of the forni “if variable a = value_l and variable_b = 
value_2 then class = class_l”, more complicated conditions are of course possible and 
supported. First-order rules are typically of the form “if pred_l(A,B) and pred_2(B,C) then 
class_l(A)” where pred_l and pred_2 are predicates such as “father_of ’ and A, B, C, and D 
are variables. 
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data mining and machine learning algorithms is therefore represented and 
visualized in a coherent manner. This did not only significantly reduce the 
coding effort for VizWiz, but also removed the burden of understanding the 
output of such diverse systems as decision tree learners and ILP -based'* rule 
learners from the user. 



f ROC r PMML and Data | 




Figure 3. The ROC curve for six models generated from the Cleveland Heart Disease domain 

[Blake, Keogh & Merz, 2002] 

Figure 3 summarizes the predictive performance of six data mining 
algorithms on the Cleveland Fleart Disease domain [Blake, Keogh & Merz, 
2002] with the help of a ROC [Provost & Fawcett, 2001] plot. The results 
were produced using the machine learning toolbox WEKA [Witten & Frank, 
1999]^; default parameters were used in all cases. Plotted are the “true 
positive” vs. the “false positive” rates. The ideal point on this plot is the 
upper left corner. A model reaching this point would correctly classify all 
positive instances, but would not classify any negative instances as 
belonging to that class. The points (“model performance”) lying on or near 
the outer hull of the curve connecting the points (0, 0) and (100, 100) 
(“Naive Bayes” and “Neural Network” in the case of Figure 3) indicate the 
best models. Depending on the characteristics of the particular task at hand, 
a user should use one of these two models or a combination thereof 



^ ILP: inductive logic programming 

^ Software that converts WEKA output to PMML is currently under development. 
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Ongoing research investigates how this technique can be extended to 
problems with more than 2 classes. Currently, VizWiz plots the performance 
of all evaluated models for one particular class against all other classes and 
shows multiple overview graphs, one for each class. 

The PMML file used to generate the visualization shown in Figure 4 
contains information about the performance of the decision tree on a specific 
data set and the visualization shows this additional information in the 
differently patterned'’ bars on top of each tree node. The bar on the top left of 
each node shows the number of instances of the class predicted by this node 
that are covered by this node. The bar (or bars in case of more than two 
classes) on the right hand side indicates the number of exceptions or 
misclassifications performed by this node. The user can thus quickly 
discover relatively pure nodes that cover a large number of instances and are 
therefore of most interest. The user cans interactively open and close tree 
nodes by clicking on the symbol to the left of each internal node. The tree 
can also be viewed in vertical mode which is more convenient for very large 
trees. Further interactive features are supported for node editing and 
viewing: change conditions, remove or add internal node or subtree, show 
more details, print node conditions as SQL. When a compatible data set is 
loaded into VizWiz (as in Figure 4), then the user can also click on a node to 
receive a table of all instances that are covered by this node. Alternatively, 
the user can click on a specific data record and the path to the tree node that 
classifies this instance will be highlighted. 
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Figure 4. VizWiz detail view of the decision tree model for the Cleveland domain. The panel 
on the left hand side shows selected records of the test data set that was used to generate the 
ROC curve shown in Figure 3. 



^ These bars are colored in reality, but have been changed to patterns for easier viewing in the 
printed version. 
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Figure 5. VizWiz visualization of a set of propositional rules 

Figure 5 shows a set of ordered rules that were produced by C4.5-rules, a 
combined decision tree / rule learner that first creates a decision tree and 
then generates a set of ordered rules from the tree. The rule set is visualized 
as a horizontal tree of depth two. C4.5-rules orders rules by classes such that 
they can be visualized in separate sub-trees. The first branch in Figure 5 
shows all rules that predict the class “no,” while the second branch predicts 
the class “yes.” A further interactive feature of VizWiz is also shown in this 
figure: the display of rule / node statistics can be changed from bars to pie 
charts. Bars convey more information since they also indicate the number of 
instances covered by the rule, but pie charts are more expressive in showing 
the relative impurity of a rule, i.e. the ratio between correctly predicted and 
incorrectly predicted instances. The fact that the rules are ordered is 
indicated by the indentation of the bars for each rule. Only the instances that 
are not covered by previous rules may be covered by subsequent rules. 

An extension of the standard PMML format for decision trees supports 
the display of first-order rules as shown in Figure 6. Especially first-order 
rules are sometimes hard to read for users that are not accustomed to Prolog 
notation. VizWiz was therefore extended with a facility to provide pretty 
print representation for rules. In Figure 6, some of the conditions are still in 
their original form (for example, “gender(Actor, female)”) while others have 
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been replaced by natural language (“Actor beams down to planet with kirk”). 
The mapping is defined in the PMML file and variable and constants are 
replaced automatically by their proper values. 



Actor annears in several etiisodes 
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}/f}] 
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Figure 6. VizWiz visualization of a set of hand-crafted first-order rules predicting whether an 
actor will appear more than once in the original star trek series 

Association rules are explicitly supported by their own PMML format 
that was one of the first formats to be agreed upon by the DMG [Data 
Mining Group, 2003]. The confidence and support values of each rule are of 
foremost interest when inspecting association rules and this is reflected by 
the slightly modified (as compared to trees and classification rules) 
visualization of association rules shown in Figure. There is now only a single 
bar on top of each rule that denotes the number of instances that support this 
rule. The shade of the bar denotes the confidence value, with a lighter shade 
indicating a higher confidence value. A number of controls (not shown) are 
available to filter rules such that the user can restrict the visualization to 
those rules that are within a specific range of confidence and support values. 
Such interaction features are essential when working with association rules 
since most association rule learners typically output a very large number of 
rules. 

The last model type to be discussed in this chapter that is visualized by 
VizWiz are subgroups, also known as deviation patterns. A typical subgroup 
discovery algorithm is MIDOS [Wrobel, 1997]. Subgroups may be used as 
classification systems, but typically they are used in tasks with highly 
skewed distributions such as the analysis of the return rate of mailing 
campaigns. The quality of a subgroup is essentially determined by its 
deviation from the norm and not by its predictive accuracy. Typically, a 
higher deviation is desired, since it denotes subgroups with very different 
behavior. The visualization of the subgroups shown in Figure 8 therefore 
shows the distribution of the target value within each group. Each group 
(with the exception of the left-most group that shows the entire data set) is 
represented by a pie chart that is nested inside another pie chart that displays 
the distribution of the target value in the entire data set. The area of the pie 
chart corresponds to the size of the subgroup. This visualization method has 
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been chosen despite the inherent danger associated with all pie chart 
visualizations that the user incorrectly perceives exact group sizes. Feedback 
from a variety of technical and non-technical users has indicated that the 
intuitive appeal of pie charts supersedes the danger of misinterpretation and 
users inspect the actual numbers once they have gained an overview through 
the pie charts. 
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Figure 7. VizWiz visualization of a set of hand crafted association rules 
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Figure 8. Visualization of three subgroups in a multi-relational medical application. Shown 
are the distributions for the entire data set (left most chart) and three selected subgroups. The 
legend on top shows the size of the entire data set and the distribution of the two target values 
'success' and 'fail'. The scrollable text below each subgroup shows the actual description of 
this group. For example the right most subgroup contains all single patients where the doctor 
gave his diagnosis with a high confidence (as shown by the pop-up text). 



VizWiz is quite efficient in processing and displaying data mining 
models and is capable of handling relatively large PMML files (i.e., decision 
trees with thousands of nodes or association rules sets with hundreds of 
rules). The time required for processing of the actual PMML file can be 
neglected in comparison to the time required to plot the visualization on 
screen. VizWiz has therefore been designed to initially display only small 
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portions of the model: in the case of decision trees, only the root is initially 
shown; in the case of association rules, only those rules with the highest 
confidence and support values are shown. The user may then select to 
display additional information which typically is realized in real time. 



4. RELATED WORK 

The complexity of data mining results and the explorative nature of 
knowledge discovery tasks have lead to a number of visualization methods 
for various data mining models [Thearling, Becker, DeCoste, Mawby, Pilote 
& Sommerfield, 2001]. Most data mining packages contain visualizations for 
decision trees. For example, WEKA [Witten & Frank, 1999] contains a 
simple non-interactive tree-viewer while SGI’s MineSet contains a highly 
interactive 2.5-dimensional tree visualizer. The SNNS [Stuttgart Neural 
Network Simulator, 2003] visualizes neural networks with particular focus 
on the visualization of the training process. IBM’s Intelligent Miner features 
a highly expressive visualization of their clustering method. It is common to 
all of these systems that visualization is only one aspect of the workbench 
and not its primary focus. Very few systems exist that specialize on 
visualization and evaluation alone. This is primarily due to the lack of 
representational standards for data mining results that is now addressed by 
PMML. The PEAR [Jorg, Pocas & Azevedo, 2002] system is such a system 
that was specifically designed as a web-based system for post-processing 
PMML association rules. It offers further selection and interaction 
mechanisms that go beyond VizWiz’s capabilities, but is limited to 
association rules. 

Gamberger, Lavrac, and Wettschereck [Gamberger, Lavrac & 
Wettschereck, 2002] have proposed an alternative method for the 
visualization of subgroups that may be implemented in VizWiz in the future. 

An alternative to the visualization of actual data mining models and a 
means for visualizing the outcome of model types that cannot be visualized 
is the visualization of class probability estimates as proposed by [Rheingans 
& desJardins, 2000; Frank & Flail, 2003]. This method visualizes the class 
predictions of classifiers in a two-dimensional space where the user can 
interactively select the two dimensions to be displayed. Each class is 
assigned a certain color and each pixel in the feature space is colored 
according to a mixture of these colors depending on the probability of each 
class at this pixel’s location. 
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5. DISCUSSION 

This chapter pursued two goals: (1) to introduce PMML to the 
information visualization community and to motivate its use as bridge 
between the fields of data mining and information visualization and (2) to 
introduce a visualization tool that fully utilizes PMML and highlights the 
benefits that can be obtained when separating model generation from model 
processing. 

VizWiz has a clear focus on the visualization and evaluation of 
classification models such as decision trees and classification rules. This 
focus reflects the fact that these methods are among the most powerful 
methods developed to date in data mining and that most commercial and 
non-commercial knowledge discovery software packages incorporate such 
methods. VizWiz is therefore potentially capable of handling the output of a 
huge variety of algorithms, as long as this output is converted to PMML. 
This implies a huge benefit for algorithm developers and users: developers 
need not worry about visualization; they can simply produce PMML output 
and then use a tool such as VizWiz to visualize their results. Users 
(“viewers”) of models can work with one single post-processing tool that is 
not restricted to a given set of analysis methods and that produces coherent 
visualizations, evaluation and interaction methods which is especially 
beneficial to the technically less skilled user. 

The impact of PMML on the field of data mining is clear: standardized 
interfaces for data input such as JDBC or ODBC and for output (PMML) 
enable researchers and developers to plug their tools with less effort into 
larger software packages thereby significantly increasing the potential user 
community. The field of information visualization can benefit from PMML 
as developers need not worry about understanding the output of specific 
analysis methods, but can concentrate on developing new methods for model 
browsing, selection, and editing. As such, visualization tools such as VizWiz 
can serve as powerful decision support tools that may help to bridge the gap 
between highly skilled knowledge discovery experts and application experts 
or general information seekers. 
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7. EXERCISES AND PROBLEMS 

1. Rewrite the PMML example in Figure 1 using the “SimpleSetPredicate” 
(see http://www.dmg.org/v2-0/TreeModel.html) for the node: 

<CompoundPredicate booleanOperator="or" > 

<SimplePredicate field="outlook" operator=" equal" 
value=" overcast" /> 

<SimplePredicate f ield="outlook" operator=" equal" 
value="rain" /> 

</CompoundPredicate> 



2. Reconstruct the PMML code for Figure 7. 

Advanced 

3. Figure 5 assumes an implicit ordering of the rules. Is it possible to 
encode this ordering in PMML explicitly and if so, how? 

4. What are the most useful (filter) controls a graphical user interface 
visualizing association rules should offer? Would these controls also be 
sensible for other model types? Does this assume the availability of 
additional data that is not part of the minimal PMML format? 
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Abstract: In this chapter, we describe new neural-network techniques developed for 

visual mining clinical electroencephalograms (EEGs), the weak electrical 
potentials invoked by brain activity. These techniques exploit the fruitful ideas 
of Group Method of Data Handling (GMDH). Section 2 briefly describes the 
standard neural-network techniques that are able to learn well-suited 
classification modes from data presented by relevant features. Section 3 
introduces an evolving cascade neural network technique that adds new input 
nodes as well as new neurons to the network while the training error decreases. 
This algorithm is applied to recognize artifacts in the clinical EEGs. Section 4 
presents the GMDH-type polynomial networks trained from data. We applied 
this technique to distinguish the EEGs recorded from an Alzheimer and a 
healthy patient as well as recognize EEG artifacts. Section 5 describes the new 
neural-network technique developed to derive multi-class concepts from data. 
We used this technique for deriving a 16-class concept from the large-scale 
clinical EEG data. Finally, we discuss perspectives of applying the neural- 
network techniques to clinical EEGs 

Key words: Classification model, pattern visualization, neural network, cascade 

architecture, feature selection, polynomial, electroencephalogram, decision 
tree 



1. INTRODUCTION 

Data mining as a process of discovering interesting patterns and relations 
in data presented by labeled examples can be referred to deriving 
classification models or classifiers that assign an unknown example to one of 
the given classes with acceptable accuracy. A typical classification problem 
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is presented by a data set of labeled examples that are characterized by 
variables or features. Experts assume such features make a distinct 
contribution to the classification problem. Such features are called relevant. 
However, among these features, some may be irrelevant and/or redundant. 
the first can seriously hurt the classification accuracy whereas the second are 
useless for the classification and can obstruct understanding how decisions 
are derived. Both the irrelevant and redundant features have to be discarded. 

In solving the classification problem, a user has to derive or learn a 
classification model from a training data set and test its performance on a 
testing data set of the labeled examples. These data must be disjoint in order 
to objectively evaluate how well the classification model can classify unseen 
examples. 

Besides that, users such as medical experts need to both classify unseen 
examples and verify decisions by analyzing the underlying causal relations 
between the involved features and the model outcome. Such an analysis can 
be comprehensively done by visualizing a discovered model and/or 
discovered patterns [Kovalerchuk & Vityaev, 2000]. Some data mining 
methods can provide the visualization of a classification model as well as 
associated patterns. For example, we may visualize an derived decision tree 
model where each branch visualizes an individual pattern. However using 
neural-network techniques, we can visualize an derived network but cannot 
visualize the interesting patterns. This issue is critical for medical experts 
who need to interpret data mining results using a visual form in addition to a 
textual description. 

In this chapter, we describe new neural-network techniques developed to 
provide both the visualization of classification models and the visualization 
of patterns. The advantages of these techniques are illustrated by mining 
medical data such as electroencephalograms (EEGs), the weak electrical 
potentials invoked by brain activity, whose spectral characteristics are taken 
as visual features. Within this chapter, we compare some existing data 
mining techniques with the new techniques in the respect to the above 
aspects of visualization. The results show that in addition to a textual 
presentation, EEG-experts can visually interpret discovered classification 
models and patterns. 

When applying data mining techniques, EEG-experts often cannot 
properly assume relevant features and avoid irrelevant and redundant ones. 
Besides that, some features become relevant after being taken into account in 
the combination with other features. In such cases data mining techniques 
exploit a special learning strategy capable of selecting relevant features 
during the derivation of a classification model [Duda & Hart, 2000; Farlow, 
1984; Madala & Ivakhnenko, 1994; Muller & Lemke, 2003]. Such a strategy 
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allows experts to learn classification models more accurately than strategies 
that select features before learning. 

Surveying data mining methods, we see that most of them aimed at 
extracting comprehensible models imply a between classification 

accuracy and representation complexity [Avilo Garcez, Broda & Gabbay, 
2001; Setiono, 2000; Towell & Shavlik, 1993]. Less work has been 
undertaken to study methods capable of discovering the comprehensible 
models without decreasing their classification accuracy. 

Below we describe new neural-network techniques developed for the 
visual mining clinical EEGs. By exploiting the fruitful ideas of Group 
Method of Data Handling (GMDH) of Ivakhnenko [Madala & Ivakhnenko, 
1994; Muller & Lemke, 2003], these techniques are able to derive 
comprehensible classification models while in the meantime keeping their 
classification error down. 

In section 2, we briefly describe standard neural-network techniques, 
including cascade-correlation architecture that are able to learn well-suited 
classification modes from data. These methods however cannot generalize 
well in the presence of irrelevant and/or noisy features. 

Section 3 introduces an evolving cascade neural-network technique that 
adds new input nodes as well as new neurons to the network while the 
training error decreases. The resultant networks have a near minimal number 
of input variables and hidden neurons, which allow classifying new 
examples well. We apply this algorithm to recognize artifacts in the clinical 
EEGs. 

Section 4 presents the GMDH-type polynomial networks that learn from 
data. These networks are represented as concise sets of short-term 
polynomials and can be presented in visual form. Moreover, the GMDH- 
type neural networks can generalize better than the standard fully connected 
neural networks. We apply this technique to distinguish the EEGs recorded 
from an Alzheimer patient and from a healthy patient as well as to recognize 
EEG artifacts. 

Section 5 describes the new decision tree neural-network technique 
developed to derive multi-class concepts from data. We use this technique 
for deriving a 16-class concept from the large-scale clinical EEG data 
recorded from sleeping newborns. This concept assists clinicians to predict 
some brain development pathologies of newborns. Finally, we discuss 
perspectives of applying the neural-network techniques to clinical EEGs 
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2. NEURAL NETWORK BASED TECHNIQUES 

In this section, we briefly describe a standard technique used in our 
experiments for training feed-forward neural networks (FNN) with a back- 
propagation algorithm. Then we describe the cascade-correlation 
architecture and end by discussing the shortcomings and advantages of these 
techniques. 

2.1 A Standard Neural-Network Technique 

A standard neural-network technique exploits a feed-forward fully 
connected network consisting of the input nodes, hidden and output neurons 
that are connected each other by the adjustable synaptic weights [Bishop, 
1995]. This technique implies that a structure of neural network has to be 
predefined properly. This means that users must preset an appropriate 
number of the input nodes and hidden neurons and apply a suitable 
activation function. For example, the user may apply a sigmoid activation 
function described as 

y =/(x, w) = 1/(1 + exp(- - lL'"w. xf, (1) 

where x = (xi, ..., xff is a mxl input vector, w = (wi, ..., w„f is a mxl 
synaptic weight vector, wo is a bias term and m is the number of input 
variables. 

Then the user has to select a suitable learning algorithm and then 
properly set its parameters such as the learning rate and the number of the 
training epochs. Note that when the neural networks include at least two 
hidden neurons, the learning algorithms with error back-propagation usually 
provide the best performance in the term of the classification accuracy 
[Bishop, 1995]. 

Within the standard technique, first the learning algorithm initializes the 
synaptic weights w. The values of w are updated while the training error 
decreases for a given number of the training epochs. The resultant 
classification error is dependant on the given learning parameters as well as 
on the initial values w° of neuron weights. For these reasons, neural 
networks are trained several times with random values of initial weights and 
different learning parameters. This allows the user to avoid local minima and 
find a neural network with a near minimal classification error. 

After training, the user expects that the neural network can classify new 
inputs well and that its classification accuracy is acceptable. However the 
learning algorithm may fit the neuron weights to specifics of training data 
that are absent in new data. In this case, neural networks become to be over- 
fitted and do not generalize well. Within the standard technique, the 
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generalization ability of the trained network is evaluated on a validation 
subset of the labeled examples that have not been used for training the 
network. 

Figure 1 depicts a case when after k* training epochs the validation error 
starts to increase while the training error continues to decrease. This means 
that after k* training epochs the neural network has become over- fitted. To 
prevent over-fitting, we can update the neuron weights while the validation 
error decreases. 
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Figure 1. Learning curves for the training and validating sets 

When classification problems are characterized in the w-dimensional 
space of input variables, the performance of neural networks may be 
radically improved by applying the Principal Component Analysis (PCA) to 
training data [Bishop, 1995]. The PCA may significantly reduce the number 
of the input variables and consequently the number of synaptic weights, 
which are updated during learning. A basic idea behind the PCA is to turn 
the initial variables so that the classification problem might be resolved in a 
reduced input space. 

Figure 2 depicts an example of a classification problem resolved in a 
two-dimensional space with input variables Xi and Xj by using a separating 
function/i(xi, X 2 ). However we can turn xi and X 2 so that this problem might 
be solved in one-dimensional input space of a principal component zi = aiXi 
+ 02 X 2 , where ai and 02 are the coefficients of a linear transformation. In this 
case a new separating function is / 2 (zi) that is equal to 0 if zi < i3i and equal 
to 1 if zi > i3i, where i3i is a threshold learned from the training data 
represented by the new variable zi. 
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Figure 2. An example of two-dimensional classification problem 

As we see, the new components zi and Z2 make a different contribution to 
the variance of the training data: the first component contributes much more 
than the second does. This example demonstrates how the PCA can 
rationally reduce the input space. However, users using PCA must properly 
define a variance level and the number of components making a contribution 
to the classification. 

Thus by using the standard technique, we may find a suitable neural- 
network structure and then fit its weights to the training data while the 
validation error decreases. Each neural network with a given number of 
input nodes and hidden neurons should be trained several times, say 100 
times. 

Thus, we can see that the standard technique is computationally 
expensive. For this reason, users use fast learning algorithms such as the 
back-propagation algorithm by Levenberg-Marquardt [Bishop, 1995]. 

2.2 A Cascade-Correlation Architecture 

To solve classification and pattern recognition problems, [Fahlman & 
Lebiere, 1990] proposed a cascade-correlation architecture of neural 
networks. The neural networks with the cascade-correlation architecture 
differ from the above networks with a predefined structure. In contrast to the 
last section, cascade networks start learning with only one neuron. Then the 
algorithm adds and trains new neurons creating a multi-layer stmcture. The 
new neurons are added to the networks as long as the residual classification 
error decreases. Thus, the cascade-correlation learning algorithm allows 
growing neural networks to a near optimal size required for good 
generalization [Farlow, 1984; Iba, deGaris & Sato, 1994; Madala & 
Ivakhnenko, 1994; Muller & Lemke, 2003]. 

Figure 3 depicts an example of cascade-correlation architecture 
consisting of four input nodes xi, ..., X4, two hidden neurons zi and Z2, and 
one output neuron y. The first hidden neuron is connected to all the input 
nodes, and the output neuron is connected to all the input nodes as well as to 
the hidden neurons zi and Z2. 
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Figure 3. An example of cascade-correlation architecture 

The learning of a cascade-correlation architecture is based on the 
following ideas. The first idea is to build up the cascade architecture by 
adding new neurons connected to all the input nodes and previous hidden 
neurons. The second idea is that the learning algorithm attempts to reduce 
the residual error by updating weights of the new neuron, that is, each time 
only the output neuron is trained. The third idea is to add one-by-one new 
neurons to the network while its residual error decreases. 

The main steps of the learning cascade-correlation algorithm are 
described below. 

nnet = [] ; % initializing 

error = n; 

new-error = error - 1 ; 
while new-error < error 

nnet = add-new-neuron (nnet) ; 
nnet = train-neuron (nnet , X, Y) ; 
new-error = calc-error (net (nnet , X) - Y) ; 
error = new-error; 
end 

nnet = cut (nnet); 

Here n, X, and Y are the number of training examples, the input data and 
a target vector, respectively. The procedure cut excludes the last neuron 
from the trained cascade network and then returns the result to the nnet. 

There are two advantages of cascade neural networks. First, no size and 
connectivity of neural networks are predefined, that is, the network is 
automatically built up. Second, the cascade network learns fast because each 
of its neurons is trained independently from other neurons. 

However, the algorithm can train cascade network well only if all the 
input variables are relevant to the classification problem. In the next section. 
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we will describe a new algorithm, which can train cascade neural networks 
in the presence of irrelevant features. 



3. EVOLVING CASCADE NEURAL NETWORKS 

In this section we describe an evolving cascade neural network 
technique, which adds new input nodes as well as new neurons to the 
network while the training error is decreased. This algorithm is used to 
recognize artifacts in the clinical EEGs. 

3.1 An Evolving Cascade Neural Network 

Let us assume a classification problem is represented by m input 
variables x\, x„ some of which may be irrelevant or noisy. In this case, a 
standard pre-processing technique used for selecting relevant variables may 
fail because this technique does not consider useful combinations of the 
input variables. A more suitable strategy is to select relevant features during 
learning. To do so, let us define the neurons in which the number of inputs, 
p, increases as follows p = r + 1, where r = 0, 1, 2, ... is the number of layer 
in the cascade network. So for r = 0, there are m neurons with one input 
variable. For the first layer, there are neurons with p = 2 inputs and so on. 

Let us now fit all the neurons for r = 0. Then among these neurons, one 
can be found that provides the best performance on the validation data set. 
Fix an input variable of this neuron, xn, in order to connect it with all the 
neurons that will be added to the network. 

At the first layer, the algorithm trains the candidate-neurons with two 
inputs: the variable Xn and one of the remaining input variables. The neuron 
with the best performance is added to the network. 

Each following neuron is connected with the variable Xn and the outputs 
of all the previous neurons. For the second layer, the candidate-neurons have 
three inputs: the first is connected with the output of the previous neuron, the 
second input with the input x,i, and the third with one of the input variables 

Xi, . . ., Xffj. 

Defining a sigmoid activation function of the neurons, we can write the 
output Zr for the rth neuron as follows: 

=/(u, w) = 1/(1 + exp(-Wo - Z'’ M, w)), (2) 

where u = (mi, ..., Up) is a pytl input vector of the rth neuron, w = (wi, ..., 
Wm) is a mxl vector of synaptic weights, and Wi is a bias term. 

The idea behind our learning method is that the relevance of an input 
variable cormected to the candidate-neuron can be estimated in ad hoc 
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manner. The learning algorithm starts to train the candidate-neurons with 
one input variable and then step-by-step adds new inputs and new neurons to 
the network. As a result, the internal connections of the cascade neural 
network are built according to the best performance achievable in each new 
layer. Therefore, in building the cascade network our algorithm exploits a 
greedy search heuristic. 

Within our technique, the performance of the network, Cr, is evaluated 
for each candidate-neuron at the rth layer. The value of C,. is dependent on 
the generalization ability of the trained candidate-neuron with the given 
connections. Clearly, a neuron connected to the irrelevant connections 
cannot properly classify the validating examples and subsequently its value 
of Cr is high. 

If value Cr calculated for the rth neuron is less than value Cr-\ calculated 
for the previous (r - 7)st neuron, the connections selected for the rth neuron 
are relevant, otherwise they are irrelevant. Formally, this heuristic can be 
described by the following inequality: 

if C < C j, then the connections are relevant, (3) 

else the connections are irrelevant. 

If the inequality (3) is met, the rth neuron is added to the network. If no 
neurons satisfy this inequality, the algorithm stops. As a result, an rth neuron 
providing a minimal validation error is assigned to be an output neuron for 
the cascade network. 

3.2 An Algorithm for Evolving Cascade Neural 
Networks 

By adding new features and neurons as they are required, the cascade 
neural network evolves during learning. The main steps of the evolving 
algorithm are described below. 

X = [Xj^, . . . , ; % a pool of m input variables 

P = 1 ; % the number of neuron inputs 

% Train single- input neurons and calculate errors 

for i = 1 : m 

N1 = create-neuron (p, X(i)); 

N1 = f it-weight (Nl) ; 

E(i) = calc-error (Nl ) ; 

end 

[E1,F] = sort-ascend (E) ; 

h = 1 ; % the position of the variable in F 

CO = El (h) ; 

% Create a cascade network NN 



344 



Chapter 14 



NN = [] ; 

r = 0 ; % the number of hidden neurons 

p = 2; 

while h < m 
h : = h + 1 ; 

V = [X(F(D ) , X(F(h) )] ; 

% Add links to the hidden neurons 
for j = l:r 
V = [V, NN(j ) ] ; 

end 

% Create a candidate-neuron N1 
N1 = create-neuron (p, V] ; 

N1 = f it-weight (Nl) ; 

Cl = calc-error (Nl ) ; 
if Cl < CO 
r : = r + 1 ; 
p := r + 2; 

NN(r) = add-neuron (Nl ) ; 
end 

end 

The algorithm starts to learn the candidate-neurons with one input and 
then saves their validating errors in a pool E. The procedure sort -ascend 
arranges pool E in ascending order and saves the indexes of the input 
variables in a pool F. The first component of the F is an index of the input 
variable providing a minimal classification error Co. 

At the following steps, the algorithm adds new features as well as new 
neurons to the network while the validation error Ci calculated for the 
candidate-neuron Ni decreases. The weights of candidate-neurons are 
updated until condition (3) is satisfied. 

As a result, the cascade neural network consisting of the r neurons is 
placed in the pool NN. The size of this network is nearly minimal because 
the stopping rule is met for a minimal number of neurons. 

Below we describe an application of this algorithm for recognizing 
artifacts in clinical EEGs. These EEGs have characteristics such as noise and 
features that are redundant or irrelevant to the classification problem. 

3.3 An Evolving Cascade Neural Network 

In our experiment, we used the clinical EEGs recorded via the standard 
EEG channels C2 and C4 from two newborns during sleeping hours. 
Following [Breidbach, Holthausen, Scheidt & Frenzel, 1998] these EEGs 
were represented by spectral features calculated in 1 0-second segments for 6 
frequency bands: subdelta (0-1.5 Hz), delta (1.5-3. 5 Hz), theta (3. 5-7. 5 Hz), 
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alpha (7.5-13.5 Hz), beta 1 (13.5-19.5 Hz), and beta 2 (19.5-25 Hz). 
Additionally for each band, the values of relative powers and their variances 
were calculated for channels C3 and C4 and their sum, C3+C4. The total 
number of the features was 72. Values of these features were normalized to 
have a zero mean and a unit variance. 

The normal segments and artifacts in the EEGs were manually labeled by 
an EEG-viewer that analyzed muscle and cardiac activities of patients 
recorded from additional chatmels. As an example of normal segments and 
artifacts. Figure 4 depicts the fragment of EEG containing 500 segments 
presented by 36 features. In this fragment the EEG-expert recognized 
segments 15, 22, 24, 84, and 85 as artifacts and the remaining as normal. 




Figure 4. Fragment of EEG containing 100 segments presented by 36 features in which the 
EEG-viewer recognized five artifacts. See also color plates. 

The patterns of EEG artifacts and normal segments can be visualized in a 
space of two principal components as depicted in Figure 5. Here artifacts and 
normal segments marked by the stars and the points, respectively. 

Observing these patterns, we see that the artifacts are located far away 
from the normal segments and therefore the statistical characteristics of these 
patterns should be different. The labeled EEG segments were merged and 
divided into the training and testing subsets containing 2244 and 1210 




14. Neural-network techniques for visual mining clinical 
electroencephalograms 



347 




Figure 6. A cascade neural network trained for recognizing artifacts and normal segments in 
clinical EEGs. The squares represent synaptic connections. 

EEG-expert observing this model can conclude the following. First, there 
are four features that make the most important contribution to the 
classification. These features are involved in the order of their significance - 
we can see that the most important feature is AbsPowBetal and the less 
important is AbsVarDelta. So the most important contribution to the artifact 
recognition in EEG of sleeping newborns is made by AbsPowBetal which is 
calculated for a high frequency band. This fact directly corresponds to a rule 
used for recognizing muscle artifacts in sleep EEG of adults [Brunner, 
Vasko, Detka, Monahan, Reynolds, & Kupfer, 1996]. 

Second, the discovered model shows the combinations between the 
selected features and hidden variables in the order of their classification 
accuracy. The EEG-expert can see that the maximal gain in the accuracy is 
achieved if the feature AbsPowAlphaC4 is combined with AbsPowBetal. 
Further improvement is achieved by combining the hidden variable zi, which 
is a function of the above two features, and the new feature AbsPowDeltaCS. 
So the EEG expert can see the four combinations of the selected features and 
hidden variables zi, ..., Z 3 listed in the order of increasing classification 
accuracy,/'! < ... </> 4 , as follows 

zi. AbsPowBet2 & AbsPowAlphaC4 —>■ pi, 
zg zi & AbsPowBet2 & AbsPowDeltaC3 —>■ p 2 , 

Z 3 : Z 2 & zi & AbsPowBet2 & AbsPowDeltaC3 —>■ pi,, 

Z 4 : Z 3 & Z 2 & zi & AbsPowBet2 & AbsPowDelta pn, 

where Z 4 = y is the outcome of the classification model. 

The third useful issue is that the synaptic connections in the discovered 
model are characterized by the real-valued coefficients, which can be 
interpreted as the strength of relations between features and hidden variables. 
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The larger value of the coefficient, the stronger relation between the feature 
and hidden variable is. 

In general, such models can assist EEG-experts to present the underlying 
casual relations between the features and outcomes in a visual form. The 
visualization of the discovered models can be useful for understanding the 
nature of EEG artifacts. 

In our experiments, we compared the performance of the above 
classification model and an FNN trained on the same data. Using a sigmoid 
activation function and a standard neural-network technique, we found that a 
FNN with four hidden neurons and 11 input nodes provides a minimal 
training error. The training and testing errors were 2.97% and 5.54%, 
respectively. 

Comparing the performances, we conclude that the discovered cascade 
network slightly outperforms the FNN on the testing EEG data. The 
improved performance is achieved because the cascade network is gradually 
built up by adding new hidden neurons and new connections. Each new 
neuron in the cascade network makes the most significant contribution to the 
artifact recognition among the all-possible combinations of the allowed 
number of features. This allows for avoiding the contribution of the noise 
features and discovering most significant relations which can then be 
visualized. 

In this experiment, the FNN has misclassified more testing examples than 
the classification model described above. Therefore, we conclude that our 
cascade neural-network technique can more successfully recognize artifacts 
in clinical EEGs. At the same time the discovered classification model 
allows EEG-experts to present the basic relations between features and 
outcomes in visual form. 



4. GMDH-TYPE NEURAL NETWORKS 

In this section, we describe GMDH-type algorithms, which allow 
deriving polynomial neural networks from data. The derived networks 
generalize well because their size or complexity is near minimal. The 
derived networks are comprehensively described by concise sets of short- 
term polynomials (polynomials with few, simple terms), which are 
comprehensible for medical experts. 

4.1 A GMDH Technique 

GMDH-type neural networks are multi-layered, feed-forward networks 
consisting of the so-called supporting neurons [Farlow, 1984; Madala & 
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Ivakhnenko, 1994; Muller & Lemke, 2003]. The supporting neurons have at 
least two inputs vi and V 2 . A transfer function g of these neurons may be 
described by short-term polynomials, for example, by a linear or non-linear 
polynomial: 

y = g(Vi, vj = w„ + w,Vi + wp„ (4) 

y = g(^r a) = W'o + W'Ti + (5) 

where Wo, Wi, W 2 , ... are the polynomial coefficients or synaptic weights of 
the supporting neuron. 

The idea behind GMDH-type algorithms is based on an evolution 
principle, which implies the generation and selection of the candidate- 
neurons. In the first layer, the neurons are connected to the input nodes, and 
in the second layer, they are connected to the previous neurons selected. For 
selecting the candidate-neurons, which provide the best classification 
accuracy, GMDH exploits the exterior criteria that are capable of evaluating 
the generalization ability of neurons on the validation data set. 

The user must properly define the number F of the selected neurons 
providing the best classification accuracy. For example, the GMDH 
algorithm may combine the m input variables by 2 in order to generate the 
first, r=l, layer of candidate-neurons . ..,yn^\ where Li = m(m - 1)!2 
is the number of the neurons which is 0{mf. The algorithm trains these 
candidate-neurons and then selects F best of them in order to generate the 
next layer. When generating the second layer, it is combined the outputs 
..., of these F selected neurons. Here the best performance of the 
algorithm it is achieved forF= 0.4Ti [Farlow, 1984; Madala & Ivakhnenko, 
1994]. 

In Figure 7, we depict an example of a three-layer, GMDH-type network. 

r=l r=2 r=3 




Figure 7. The structure of neural network grown by GMDH algorithm 

The neuron-candidates that were selected for each of the layers are 
depicted here as the gray boxes. Here the neuron that provides the best 
classification accuracy and is assigned to be the output neuron. The resulting 
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polynomial network, as we can see, is the three-layer network consisting of 
six neurons and three input nodes. This network is described by a set of the 
following polynomials: 

= giixi, X2), 

= giixu X4), 

= g3(X2, X 4 ), 

Thus, for the kth training example, we can calculate the output y of the 
neuron as 



y= g(yv, V®), k=l, 

where w is a weight vector, v is an input vector and n is the number of 
training examples. 

For selecting F best neurons, the exterior criterion is calculated on the 
unseen examples of the validation set that have not been used for fitting the 
weights w of neurons. These examples are reserved by dividing the dataset D 
into two non-intersecting subsets Da = (Xa, yA°) and Db = (Xb, yB°), the 
training and validating data sets, respectively. The sizes «a and «b of these 
subsets is usually recommended to be defined with «a “ «b, and «a+ «b = «• 

Let now find a weight vector w* that minimizes the sum square error e of 
the neuron calculated on the subset Da: 

e = 21, (g( V®, w) - y\)\ k=l, 

To obtain the desirable vector w*, the conventional GMDH fits the 
neuron weights to the subset Da by using a Least Square Method (LSM) 
[Bishop, 1995; Farlow, 1984; Madala & Ivakhnenko, 1994], which can 
produce effective evaluations of weights with Gaussian distributed noise in 
the data. As noise in real-world data is often non-Gaussian [Duda & Flart, 
2000; Tempo, Calafiore & Dabbene, 2003], we will use the learning 
algorithm described in Section 3, which does not require a hypothesis about 
the noise structure. 

Having found a desirable weight vector w* on the subset Da, we can 
calculate the value CRi of the exterior criterion on the validation subset Db: 

CR, = w *) - y\f, k=l, i= Y, (6) 

We can see that the calculated value of CRi depends on the behavior of 
the /th neuron on the unseen examples of the subset Db. Therefore, we may 
expect that the value of CR calculated on the data D would be high for the 
neurons with poor generalization ability. 
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The values CRi calculated for all the candidate-neurons at the rth layer 
are arranged in an ascending order: 

<CR^,< ...<CR,< ...< CR^, 

so that the first F neurons provide the best classification accuracy. 

For each layer r, it is found out a minimal value CRf corresponding to 
the best neuron, i.e., CRf = CRn. The first F best neurons are then used at 
the next, r + 1, layer, and the training and selection of the neurons are 
repeated. 

The value of CRf decreases step-by-step while the number of layers 
increases and the network is built up. Once the value of CR reaches to a 
minimal point and then starts to increase and we can conclude that the 
network has been over- fitted. Flere because the minimum of CR was reached 
at the previous layer, we stop the training algorithm and take the desirable 
network, which was grown at the third layer. 

4.2 A GMDH-Type Algorithm 

The conventional GMDFI-type algorithms perform an exhaustive search 
for candidate-neurons in each layer. The number of candidate neurons 
increases very fast with increasing the number m of inputs as well as with 
the number F of selected neurons. For both the first and following layers 
these numbers are Li = 0(m^) and L 2 = 0(F^). Below we describe the GMDFI- 
type algorithm we developed and applied to derive the polynomial networks 
from data represented by > 70 input features. 

The idea behind this algorithm is to select the neurons one-by-one and 
add them to the network along with calculated probabilities. For selecting 
the neurons we use the exterior criterion described above. 

In contrast to the conventional exhaustive search, the algorithm randomly 
selects a pair of the neurons by using a “roulette-wheeF in which the wheel 
area is divided into F sectors. The area of these sectors is proportional to the 
classification accuracy of the selected neurons on the training data. The 
neurons selected in the pair are then mated with a probability, which is 
proportional to their classification accuracy on the validating examples. 
When adding the new layer to the network, the algorithm attempts to 
improve the accuracy of the network for a specified number of times. 

X = [Xj, . . . , x^] ; % a pool of m input variables 
k = 0 ; % the number of neurons in the network NN 
% Train neurons with p inputs and calculate accuracy 
p = 1 ; % the number of inputs 
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for i = l;m 

N1 = create-neuron (p , X(i)); 

N1 = fit-weight (Nl) ; 

A(i) = calc-accuracy (Nl) ; 

end 

% Create new two- input neurons for gno attempts 

p = 2 ; 

for i = l:gno 

pair = turn-roulette (p , A) ; 

Nl = create-neuron (p , X(pair)); 

Nl = fit-weight (Nl) ; 
ac = calc-accuracy (Nl) ; 

% Selection and Addition 
if ac > max (A (pair)) 
k : = k + 1 ; 

NN (k) = add-new-neuron (Nl ) ; 

A(m + k) = ac; 
end 

end 

As a result, the variable NN contains description of the neural network. 
This network provides the best classification accuracy on the validating 
examples. 

4.3 Classification of EEGs of Alzheimer and Healthy 
Patients 

In our experiments, we used EEGs recorded from an Alzheimer patient 
and EEGs recorded from a healthy patient via the standard 19-channels Cl, 
..., C19 during 8 second intervals [Duke & Nayak, 2002]. Muscle artifacts 
were deleted from these data by an expert. We used the standard Fast Fourier 
Transform technique to calculate the spectral powers into four standard 
frequency bands: delta (0-4 Hz), theta (4-8 Hz), alpha (8-14 Hz) and beta 
(14-20 Hz). 

As the spectral powers were calculated into half second segments with a 
quarter second overlapping, each EEG record consisted of 31 segments 
represented by 76 spectral features. The first 15 segments were used for 
training and the remaining 16 for testing, so the training and testing data 
consisted of 30 and 32 EEG segments respectively. 

Exploiting the non-linear polynomial from equation (5) above and with 
F = 1, our algorithm derived a polynomial network consisting of four input 
nodes and three neurons. In Figure 8, we depict this network. As you can 
see, the derived classification rule is described by a set of three polynomials: 
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yf = 0.6965 + 0.3916x„ + 0.2484x,, - 0.2312x„x,„ 

= 0.3863 + 0.5648}'/*’ + 0.5418x73 - 0.484T>'/‘’x73, 

= 0.1914 + 0.7763}'!® + 0.2378x76 - 0.2042};i®X76 

where Xn is delta in Cll, Xeg, X 73 , and X 76 are beta in C12, C16, C19, 
respectively. 

r=l r=l r=3 




Figure 8. A polynomial network for classifying EEG of a Alzheimer and a healthy patient 

Note that medical experts can interpret these polynomials as a weighted 
sum of two features. For example, polynomial y/'* is interpreted as a 
weighted sum of features Xn and X 69 . The first two weights show the 
significance of these features for the polynomial output and the third weight 
shows the power of interaction between these two features. 

Having applied the standard neural-network technique to these data, we 
found that a FNN, which consists of 8 input nodes and 2 hidden neurons, 
provides the best classification accuracy. We also applied a conventional 
GMDH-type technique to these data. All three neural networks misclassified 
one testing segment a testing error rate of 3.12%. 

4.4 Recognition of EEG Artifacts 

The EEGs used in our next experiments were recorded from two sleeping 
newborns. These EEGs were represented by 72 spectral and statistical 
features as described in [Breidbach et ah, 1998] calculated in 10-second 
segments. For training, we used the EEG recorded from one newborn and for 
testing the EEG recorded from the other newborn. These EEGs consisted of 
1347 and 808 examples in which an expert labeled respectively 88 and 71 
segments as artifacts. 

For comparison, we used the standard neural network and the 
conventional GMDH techniques. We found out that the best FNN consisted 
of 10 hidden neurons and misclassified 3.84% of the testing examples. The 
GMDH-type network was grown with an activation function, equation (5) 
above, with m = 12 inputs and F = 40. We ran our algorithm with the same 
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parameters and derived a polynomial network consisting of seven input 
nodes and 1 1 neurons as depicted in Figure 9. 



r=l r=2 r=3 r=4 




Figure 9. A polynomial network derived for recognizing EEG artifacts 

This polynomial network misclassified 3.47% out of testing examples. 
This network is described by the following set of 1 1 short-term polynomials: 

= 0.9049 - 0.1707JC, - 0.1616x„ + O.OSSOx^jr,,, 

= 0.9023 - 0.2128X, - 0.1389x,g + 0.0438x5^,,, 
y/'* = 0.9268 - 0.1828x, - 0.1 195 jc,, + 0.0233x^„ 

= 0.9323 - 0.2057x, - 0.046 1 jc,j + 0.0246x^,„ 

= 0.9247 - 0.1822x, - 0.095 lx,, + 0.0196xpc,„ 
y,® = 0.0590 + 0.2810y/'> + 0.3055y/’ + 0.3670y/‘’y4 
y ® = 0.0225 + 0.4144y,® + 0.3812y3“> + 0.1878y3‘V"> 
y 3 ® = 0.0609 + 0.2917y/‘> + 0.2738y/’ + 0.3880y3®y3®, 
y,® = 0.0551 + 0.3033y,® + 0.3896y ® + 0.2540yi®y ®, 
y/’ = 0.0579 + 0.4058y ® + 0.2834y3® + 0.2549y ®y 3 ®, 
y/'*) = -0.0400 + 0.6196y,® + 0.5702y ® - 0.1504y,®y 

where Xs is the absolute power of subdelta in C4, xe is the absolute power of 
subdelta, X 21 is the real power of alpha, X 28 is the absolute power of betal in 
C3, X 55 is the absolute variance of theta in C4, X 57 the is absolute variance of 
subdelta andx 62 is the absolute variance of subdelta in C3. 

Table 1 depicts the errors of the FNN, GMDH-type and polynomial 
neural networks (PNN) on the testing data. Note that both the FNN and the 
PNN were trained 1 00 times because their weights are initialized randomly. 
The conventional GMDH algorithm ran one time because it exploits the 
standard LSM technique of evaluating the synaptic weights. 
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Table 1. The classification errors of neural networks 



Error rate, % 


Data 


FNN 


GMBH 


PNN 


Train (patient 1) 


2.00 


2.06 


2.23 


Test (patient 2) 


3.84 


4.08 


3.47 



Observing the results listed in Table 1, we can conclude that the PNN 
trained by our method recognizes EEG artifacts slightly better than the FNN 
and GMDH-type network. 



5. NEURAL-NETWORK DECISION TREES 

In this section, we describe neural-network, decision-tree techniques, 
which exploit multivariate linear tests and algorithms searching for relevant 
features. The results of linear tests are easily visualized for medical experts. 
We also describe a new decision tree structure and an associated algorithm 
that is able to select the relevant features. This technique is shown to 
perform well on the large-scale clinical EEGs 

5.1 Decision Trees 

Decision tree (DT) methods have been successfully used for deriving 
multi-class concepts from real-world data represented by noisy features 
[Brodley & Utgoff, 1995; Duda & Hart, 2000; Quinlan, 1993; Salzberg, 
Delcher, Fasman & Henderson, 1998]. Experts find that results from a DT 
are easy to observe by tracing the route from its entry point to its outcome. 
This route may consist of the subsequence of questions which are useful for 
the classification and understandable for medical experts. 

Conventional DTs consist of the nodes of two types. One is a splitting 
node containing a test, and other is a leaf node assigned to an appropriate 
class. A branch of the DT represents each possible outcome of the test. An 
example is presented to the root of the DT and follows the branches until the 
leaf node is reached. The name of the class at the leaf is the resulting 
classification. 

A node can test one or more of the input variables. A DT is a multivariate 
or oblique, if its nodes test more than one of the features. Multivariate DTs 
are in general much shorter than those which test a single variable. These 
DTs can test Threshold Logical Units (TLU) or perceptrons that perform a 
weighted sum of the input variables. Medical experts can interpret such tests 
as a weighted sum of questions for example: Is 0.4 * BloodPressure + 0.2 * 
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HeartRate > 46? Weights here usually represent the significance of the 
feature for the test outcome. 

To learn concepts presented by the numerical features [Duda & Hart 
,2000], and [Salzberg et al. 1998] have suggested multivariate DTs which 
allow classifying linearly separable pattern?,. By definition such patterns are 
divided by linear tests. However using by these algorithms [Brodley & 
Utgoff, 1995; Frean, 1992; Parekh, et al., 2000; Salzberg et al., 1998], DTs 
can also learn to classify non-linearly separable examples. 

In general, DT algorithms require computational time that grows 
proportionally to the number of training examples, input features, and 
classes. Nevertheless, the computational time, which is required to derive 
multi-class concepts from large-scale data sets, becomes overwhelming, 
especially, if the number of training examples is in the tens of thousands. 

5.2 A Linear Machine 

A Linear Machine (LM) is a set of r linear discriminant functions 
calculated to assign a training example to one of the r > 2 classes [Duda & 
Hart, 2000]. Each node of the LM tests a linear combination of m input 
variables xi, X 2 , . . ., x„, and xq = 1. 

Let us introduce a w-input vector x = (xo, Xi, ..., x„,) and a discriminant 
function g(x). Then the linear test at the yth node has the following form: 

gj{x) = Ew/x_. = w^^x > 0, i = 0, j = I, r, (7) 

where Wo^, ..., are the real valued coefficients also known as a weight 
vector of the yth TLU. 

The LM assigns an example x to the j class if and only if the output of the 
yth node is larger than the outputs of the other nodes: 

g^{x)>g,{x), k^j=l,...,r. (8) 

This strategy of making a decision is known as Winner Take All (WTA). 

While the LM is learning, the weight vectors w ^ and w * of the 
discriminant functions gj and gk are updated for each example x that the LM 
misclassifies. A learning rule increases the weights w ’ , where j is the class to 
which the example x actually belongs, and decreases the weights where k 
is the class to which the LM has erroneously assigned the example x. This is 
done using the following error correction mle: 

w‘ — w' + cx, := w* - cx, (9) 

where c > 0 is a given amount of correction. 

If the training examples are linearly separable, the preceding procedure 
can yield a desirable LM giving maximal classification accuracy in a finite 
number of steps [Duda & Hart, 2000]. If the examples are non-linearly 
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separable, this training procedure may not provide predictable classification 
accuracy. For this case other training procedures have been suggested, we 
will discuss some of them below. 

5.3 A Pocket Algorithm 

To train the DT from data that are non-linearly separable, [Gallant, 1993] 
suggested a Pocket Algorithm. This algorithm seeks weights of multivariate 
tests that minimize the classification error. The Pocket Algorithm uses the 
error correction rule (9) above to update the weights and of the 
corresponding discriminant functions g, and gk. The algorithm saves in the 
Pocket the best weight vectors that are seen during training. 

In addition. Gallant has suggested the “ratchet” modification of the 
Pocket Algorithm. The idea behind this algorithm is to replace of the weight 
by the current W only if the current LM has correctly classified more 
training examples than was achieved by W^. The modified algorithm finds 
the optimal weights if sufficient training time is allowed. 

To implement this idea, the algorithm cycles training the LM for a given 
number of epochs, n^. For each epoch, the algorithm counts the current 
number of input series of correctly classified examples, L, and evaluates the 
accuracy A of the LM on the training set. 

In correspondence to inequality (8), the LM assigns a training example 
(jc, q) to the yth class, where ^ is a class where the example x actually 
belongs. The LM training algorithm consists of the following steps: 

W = init-weight ( ) ; 

[Wp, Lp, Ap] = set-pocket (W) ; 

for i = l:n% n is the number of training examples 
[x, q] = get-random (X) ; 
j = classify (x) ; 
if j ~= q 

Lp = 0; 

W(j) ;= W(j) + c*x; 

W (q) ;= W(q) - c*x; 

else 

if L > Lp 

A = calc-accuracy 0 ; 
i f A > Ap 

% Update the pocket 

Wp = W; 

Lp = L; 

Ap = A; 
end 
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end 

end 

end 

As the searching time that the algorithm requires grows proportional to 
the number of the training examples as well as of the input variables and 
classes, the number of epochs must be large enough to achieve acceptable 
classification accuracy. For example, in our case, the number of the epochs 
is set to the number of the training examples. The best classification 
accuracy of the LM is achieved if c is equal to 1 . 

When the training examples are not linearly separable, the classification 
accuracy of LMs may be unpredictable large. There are two cases when the 
behavior of the LM is destabilized during training. First, a misclassified 
example is far from the hyperplane dividing the classes. In such a case, the 
dividing hyperplane has to be substantially readjusted. Such relatively large 
adjustments destabilize the training procedure. Second, the misclassified 
example lies very close to the dividing hyperplane, and the weights do not 
converge. 

To improve the convergence of the training algorithm, [Grean, 1992] has 
suggested a thermal procedure. This procedure decreases attention to the 
large errors by using the following correction 

c = p/((3 + C), k = iw‘ -w 'fxKTx^x) + e, 

where P is a parameter initialized to 2, and 8 > 0. 7 is a given constant. 

The parameter P is adjustable during training as follows. First, the 
magnitudes of the weight vectors are added. If the sum decreases for the 
current weight adjustment, but increased during the previous adjustment, the 
parameter P is reduced: P = «P - h, where a and b are given constants. 

This reduction of P enables the algorithm to spend more time training the 
LM with small values of P that are needed to refine the location of the 
dividing hyperplane. However, experiments on the real-world classification 
problems show that the training time for the thermal procedure and the LM 
is comparable [Parekh et al., 2000]. 

5.4 Feature Selection Algorithms 

In order to derive accurate and understandable DT models, we must 
eliminate the features that do not contribute to the classification accuracy of 
DT nodes. To eliminate irrelevant features, we use the Sequential Feature 
Selection (SFS) algorithms [Duda & Hart, 2000; Galant, 1993] based on a 
greedy heuristic, also called the hill-climbing strategy. The selection is 
performed while the DT nodes are trained from data. This avoids over-fitting 
more effectively than the standard methods of feature pre-processing. 
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The SFS algorithm exploits a bottom up search method and starts to learn 
using one feature. Then it iteratively adds the new feature providing the 
largest improvement in the classification accuracy of the linear test. The 
algorithm continues to add the features until a specified stopping criterion is 
met. During this process, the best linear test Tj, with the minimum number of 
the features is stored. In general, the SFS algorithm consists of the following 
steps. 

p = 1 ; % the number of features in the test 
% Test the unit-variant tests T 
for i = 1 : m 
T (i) = test (p, X) ; 
end 

Tb = f ind-best-test (T) ; 
while stop-rule (Tb, p) 
p := p + 1; 

T1 = f ind-best-test (p, T) ; 

% Compare the accuracies of T1 and Tb 
if Tl.A > Tb.A 
Tb = Tl; 
end 
end 

The stopping rule is satisfied when all the features have been involved in 
the test. In this case m + {m - 1) + ... + {m - k) linear tests have been made, 
where k is the number of the steps. Clearly if the number of the features, m, 
as well as the number of the examples, n, is large, the computational time 
needed to terminate may be unacceptable. 

To stop the search early and reduce the computational time, the following 
heuristic stopping criterion was suggested by [Parekhet et ah, 2000]. They 
found that if at any step, the accuracy of the best test is decreased by more 
than 10%, then the chance of subsequently finding a better test with more 
features is slight. 

However, the classification accuracy of the resulting linear test depends 
on the order in which the features have been included in the test. For the SFS 
algorithm, the order in which the features are added is determined by their 
contribution to the classification accuracy. As we know, the accuracy 
depends on the initial weights as well as on the sequence of the training 
examples selected randomly. For this reason the linear test can be non- 
optimal, i.e., the test can include more or fewer features than needed for the 
best classification accuracy. The chance of selecting the non-optimal linear 
test is high, because the algorithm compares the tests that differ by one 
feature only. 
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5.5 Derivation of Neural-Network Decision Trees 

The idea behind our DT derivation algorithm is to individually train the 
test nodes and then group them in order to linearly approximate dividing 
hyperplanes. The DT test nodes, which are realized by TLUs, are 
individually trained to classify examples of two classes. For r classes, 
therefore, it is necessary to classify the 0{r^) variants of the training subsets 
and train the same number of TLUs. 

We can consider the trained TLUs, which deal with one class, as the 
hidden neurons of a neural network. The number of such networks is equal 
to the number of the classes, r. The contributions of these hidden neurons are 
summarized by the output TLU. Therefore each neural network makes a 
linear approximation to the dividing hyperplane between classes. 

Let us introduce a TLU, fi/j, performing the linear test (7) above, which 
learns to divide the examples of a pair of classes Q, and Q,. If the training 
examples of these classes are linearly separable, then the output y of the 
TLU is described as follows 

3'=4(U= L VxE Q,, (10) 

y =fi/i{x) = - 1, V X E Q,-. 

Indeed, medical experts can find that the features dividing two classes are 
simpler to observe than those dividing the multiple classes for r > 2 . 
Fortunately, when the number of classes does not exceed several tens, such a 
pairwise approach can be efficiently applied to a multi-class problem by 
transforming it into a set of simple binary classifiers. 

Having introduced the linear tests, now we can illustrate the idea of our 
derivation algorithm with a simple case of r = 3 classes. In Figure 10, we 
depict three classes Qi, Q 2 , and Q 3 , which hardly overlap each other. For this 
simple case, we need to train the r (r - 1) / 2 = 3 TLUs. The lines in Figure 
10 depict the hyperplanes and ^/3 of the TLUs trained to divide the 

classes Qi and Q 2 , and Q 3 , as well as Q 2 and Q 3 . 

Also in Figure 10, we depict three new dividing hyperplanes gi, g 2 and 
g 3 . The first hyperplane, gi, is a superposition of the linear tests fm and / 1 / 3 , 
i.e., gi =/i /2 + / 1 / 3 . The linear tests fm and/ 1/3 here are summed with weights 
equal to 1 , because both give us the positive outputs on the examples 
belonging to the class Qi. Correspondingly, the second and third dividing 
hyperplanes are g2 =/2/3 -/1/2 and g3 = -/ 1/3 
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Figure 10. The approximation given by the dividing hyperplanes g,, g^ and gj 

We can see that an example x belonging to class Q2 causes the outputs of 
gi, g 2 and g3 to be equal to 0, 2, and -2, respectively: 

= fm(x) +ffx) = 1-1=0, 
giix) = f 2 , 4 x) -ffx) = 1 + 1=2, 
g3(-^) = -fm(x) -f4x) = -1 - 1 = -2. 

We can see that among gi, g2, and gs, the second output is largest, g2 = 2. 
Finally, the DT, using the WTA strategy, correctly assigns the example x to 
the class Q2- 

For this case, the dividing hyperplanes gi, g2, and g3 were approximated 
by r = 3 feed- forward neural networks consisting of the (r - 1) = 2 hidden 
TLUs. In Figure 11, we depict these networks in which hidden neurons 
perform the linear tests /i/2,/i/3, and /2/3, respectively. The hidden neurons are 
connected to the output neurons gi, g2 and g3 with the weights equal to (+1, 
+1), (-1, +1) and (-1, -1), respectively. 




Figure 11. An example of the neural-network decision tree for r = 3 classes 

In general for r > 2 classes, the neural network consists of r(r - l)/2 
hidden neurons /1/2, ...,f/j, \ir and r output neurons gi, ..., gr, where i 

<7,7 = 2, ...,r. The output neuron g, is connected to (r - 1) hidden neurons 
which are partitioned into two groups: the first group consists of the hidden 
neurons f/k for which k > i, and the second group consists of the hidden 
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neurons fui for which k < i. The final step is to set up the weights of output 
neurons: each output neuron g, is connected to the hidden neurons fi/k and fk/i 
with weights equal to +1 or -1. 

As we see, each hidden neuron in the network learns to distinguish one 
class from another. The neurons learn independently of each other. However, 
the performance of the hidden neurons depends on the contribution of the 
input variables to the classification accuracy. For this reason, we next 
discuss a DT derivation algorithm which is able to select relevant features. 



5.6 A Decision Tree Derivation Algorithm 

The feature selection algorithm that we discussed in Section 5.4 searches 
for new features, which cause the largest increases of the classification 
accuracy of the linear tests. Recall that first this algorithm compares the tests 
that differ by one feature and that the algorithm then uses the greedy 
heuristic to select the new feature that provides the largest increase in the 
accuracy of the current test. 

In our experiments, we have found that the comparison between the 
linear tests, which differ by more than one feature, increases the chance of 
accepting those tests, which improves the classification accuracy of the DT. 
We have also found that in real-world classification problems represented by 
noisy data, the greedy heuristic often finds a local minimum of the 
classification error. To increase the chance of escaping from local minima, 
we can evaluate the cross-validation classification error of linear tests. Using 
these heuristics, we developed the DT derivation algorithm shown below: 

X = [Xj, X^,..., X^] ; 

% Test the unit-variant tests U 

for i = l:tn 

U(i) = train-test (X (i) ) ; 

C(i) = calc-accuracy (U ( i )) ; 

end 

% Create the pools P and F 

P = C/max (C) ; 

[P, F] = sort-descend (P) ; 

Tb = [] ; % initialize 

Ab = 0; 

for k = 1 : attempt-no 
T = [] ; 

i = 0; 

feature-no = 0, 

% Search for a candidate-test T 
while stop-rule (T, i) 



14. Neural-network techniques for visual mining clinical 
electroencephalograms 



363 



i : = i + 1 ; 

if P(i) > rand(l) % wheel of roulette 
T1 = [T X(F(i))]; % the features of test 
T1 = train-test (Tl) ; 

A1 = calc-accuracy (Tl) ; 
if A1 > A 

T : = Tl ; 

A : = A1 ; 

feature-no := feature-no + 1 ; 
end 
end 
end 

% Replace the best test Tb 
i f A > Ab 
Ab := A; 

Tb : = T ; 
end 
end 

To search for a best multivariate test, this algorithm exploits a strategy of 
evolution: it starts to train the single-variable tests including one feature 
i = m. Then, it calculates a probability />,■ that is proportional to the 

accuracy of the hh test. 

The calculated values of the probabilities are arranged in decreasing 
order , pn >pa>...> pm, and then they are placed in a pool P. Likewise the 
features Xn, Xa, are placed in a pool F. 

Next, the algorithm sets an empty array to the test, T, and 1 to the feature 
index, i. Then it attempts to add the feature x, to T. If this occurs with 
calculated probability pi, then a candidate-test T\ is formed. The weights of 
this test are fitted to the training data, and then the classification accuracy A \ 
of the test on the validation test is calculated. 

If the accuracy Ay becomes higher than the accuracy A of the current test 
T, then T is replaced by the candidate-test T\. The number of features used in 
the new linear test T\ increases by one. 

The algorithm is repeated until a stopping criterion is met. This criterion 
is met in two cases: first, if the linear test T includes the given number N/- of 
the input variables, m, or second, if all the features have been tested. 

Note that the algorithm compares the linear tests T and Ti, which may 
differ by several features. This increases the chance of searching out a best 
linear test. 

To increase the chance of locating the best solution, the linear tests are 
trained by the given number AL attempts, each time with a different sequence 
of features. As a result, a unique set of the features is formed in the test T/,. 
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Using these features, the linear test classifies the training examples with the 
best classification accuracy 

For fitting the DT linear tests, we used 2/3 of the training examples and 
evaluated the classification accuracy on all the training data. We varied the 
number of attempts from 5 to 25. 

5.7 Learning a Multi-Class Concept from the EEGs 

Next we describe an application of the above DT algorithm for learning a 
multi-class concept from clinical EEG recordings. The EEGs were recorded 
from 65 sleeping patients via the standard EEG electrodes C3 and C4. These 
patients were healthy newborns with ages ranging between 35 and 51 weeks. 
The desired concept must distinguish the EEG recordings between these 
r = 16 age groups (classes). 

Following [Breidbach, et ah, 1998], the raw EEGs were segmented and 
transformed into 72 spectral and statistical features. Some of these features 
were redundant or irrelevant to the classification problem. 

For training and testing the DT, we used 39399 and 19670 EEG segments 
respectively. For a given r = 16 classes, the DT included the r (r- 1)!2 = 120 
simple binary classifiers. The training errors of these classifiers varied 
between 0 and 15%, see Figure 12(a). 




0 20 40 60 80 100 120 

Classifiers 



Figure 12. The training errors (a) and the number of features (b) for 120 binary classifiers 

Note that the trained classifiers use different sets of the features (input 
variables). The number of these features varies from 7 to 58, see Figure 
12(b). 
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The trained neural network DT correctly classified the 80.8% of the 
training and 80.1% of the testing examples. Summing all the segments 
belonging to one EEG recording, the trained DT correctly classified 89.2% 
and 87.7% of the 65 EEG recordings on the training and testing examples, 
respectively. 

In Figure 13, we depict the distributions of the classified testing segments 
over all 16 classes for two patients belonging to the second and third age 
groups, respectively. Observing these distributions, we can give a 
probabilistic interpretation of the decisions. For example, we can decide that 
the patients belong to the second and third age groups with probabilities 0.92 
and 0.58, respectively. 




Figure 13. The distribution of the classified testing segments for two patients 

We compared this DT technique with some data mining techniques on 
the same EEG data. First, we derived the EM described earlier. Second, we 
trained the feed-forward neural networks by using the standard back- 
propagation algorithm. The structures of the neural networks included from 
8 to 20 input nodes and up to 20 hidden neurons. Third, we independently 
trained r = 16 binary classifiers to distinguish one class from the others. 
Fourth, we trained a binary decision tree consisting of r - 1 = 15 linear 
classifiers. However, in our experiments none of these standard techniques 
could achieve a desirable classification accuracy. 
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6. A RULE EXTRACTION TECHNIQUE 

In some cases, the classification models trained by data from a neural 
network can be represented as decision tree rules [Avilo Garcez et al., 2001; 
Sethi & Yoo, 1997]. However in general this technique cannot guarantee that 
the resulting decision tree was not trapped in a local solution [Kovalerchuk 
& Vityaev, 2000]. In order to take this into account, we will next describe 
our technique developed to derive decision tree rules in ad hoc manner. 

The idea behind our method is to project an original classification 
problem into an input space in which most of the training examples become 
separable. The dimensionality of such an input space can be significantly 
less than that of the original space. The neural-network techniques described 
in sections 3 and 4 are well suited for this role because they outperform 
standard neural networks. 

Indeed, by removing the misclassified examples from the training data 
and eliminating noise and irrelevant features from the original feature set, we 
can significantly simplify the class boundaries and the solution of the 
classification problem. A decision tree derived from such data can be well 
suited for the classification of new observations. 

To describe our technique, let us assume that the polynomial neural 
network performs well enough on the testing data. Next define the training 
subsets, XO and XI, to consisting of «o and examples which have been 
correctly assigned by this network to the classes 0 and 1 . These examples are 
represented in the new space of features, V, whose dimensionality is now 
equal to m. Then the DT derivation algorithm can be described as follows. 

T = [] ; % a decision tree T = 0 

V = l:m; % a pool V of features 

find-node (XO , XI, V) ; 

The procedure find-node is invoked with parameters XO, XI, and V. 
This procedure adds a new node to the decision tree T and then recursively 
calls itself as follows: 

m = number-of -features (V) ; 

% Search a threshold and an outcome p. 
for i = 1 to m do 

[qi< Pj] = search-threshold-and-outcome ( ) ; 
end 

[Vj, e^] = find-feature-dividing XO and XI; 
fj = create-new-test ( ) ; 

T = [T, fj ; % add new test to T; 

% Calculate the outputs YO and Y1 
YO = f,(X0) ; 

Y1 = fAXl) ; 
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V = remove- feature (v^) ; 
if V not empty 



% Find 


the examples 


AO, AlO, 


A1 and 


AOl : 


AO = 


find (YO 


== 0) 


/ 






AOl = 


find (YO 


== 1) 


; % the 


errors 


of 0 


A1 = 


find(Yl 


== 1) 


/ 






AlO = 


find(Yl 


== 0) 


; % the 


errors 


of 1 



if AlO not empty 



find-node (XO (AO , 
end 

if AOl not empty 


V) , 


XI (AlO, 


V) , 


V) 


find-node (XO (AOl , 
end 


V) 


, XI (Al, 


V) , 


V) 



end 



We have used this algorithm to derive a decision tree for recognizing the 
artifacts in the clinical EEGs. First we trained the polynomial network 
described in section 4.4 from the training data which originally was 
represented by 72 features. Then we removed from these data all 30 
misclassified examples and used the 7 discovered features to present the data 
in a new input space. 

To derive a DT from the new data, the preceding algorithm was applied. 
This algorithm has derived a simple DT which exploits only one variable Xe, 
the absolute power of subdelta summed over channels C3 and C4, as 
depicted in Figure 14. 



EEG Segment 




Figure 14. A decision tree rule for classifying the normal EEG segments and artifacts 

Surprisingly, this decision tree has misclassified 24 testing examples 
while an original polynomial network misclassified 28. More experimental 
results can be found in the following paper devoted to the artifact 
recognition in the clinical EEGs [Schetinin & Schult, 2004]. 

Also we note that EEG experts can easily understand and interpret this 
decision tree as follows: an EEG segment is an artifact, if the value of 
absolute power of subdelta, x^, is more than 1.081, otherwise, it is normal 
segment. 
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1 . CONCLUSION 

Standard neural networks can learn classification rules from real-world 
data well, however such classification models may not be comprehensible 
for experts. Classification models can become to be more understandable if 
they are represented in a visual form. To achieve such a representation, data 
mining techniques, based on a strategy of searching for a trade-off between 
complexity and accuracy of classification rules, are commonly used. In 
contrast to this strategy, the methods described in this chapter allow experts 
to present classification models in visual form and keep their classification 
error down. 

We have presented examples of the application of standard techniques 
and our neural-network techniques to clinical EEG data for the extraction 
classification models which EEG-experts could easily represent visually. On 
testing data the new models performed slightly better than the standard feed- 
forward and GMDH-type networks. Thus, we conclude that our neural- 
network techniques can be successfully used for the visual data mining of 
clinical EEGs. 
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9. EXERCISES AND PROBLEMS 

1. Assume a fully coimected neural network consists of 5 input nodes, 3 
hidden and 2 output neurons. What is the minimum number of examples 
required to train this network by back-propagation? Why does user need 
to preset the structure of the neural network? What is changed in the 
neural network if the user applied PGA and determined two principle 
components? 

2. Suppose a continuous exclusive OR (XOR) problem is described as 
follows 



y=l, ifxiX 2 > 0, andy = 0, ifxiX 2 < 0, 
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where y are a target output, and XiS [-1, 1], X 2 g [-1, 1] are the input 
variables. 

If the user uses a fully connected neural network, what structure has to 
be preset for this problem? 

3. Assume an evolved cascade neural network which consists of 3 hidden 
neurons and 1 output neuron. How many examples are required to train 
this network? How many neurons are required to solve XOR problem? 

4. Suppose a GMDH-type neural network uses a transfer polynomial 

y = V2) = Wo + WiVi + W2V2 + W3V1 V2 + W4V]^ + W5V2^, 

where vi and V 2 are the input variables and Wo, . . ., W5 are the coefficients. 

What minimal number of examples is required to train this network? 
When will GMDH-type neural networks out-perform fully connected 
neural networks and vice versa? 

5. When and why will multivariate decision trees outperform decision trees 
which test single variables? Regarding the XOR problem (2.) above, 
which of these techniques is better? 

6. Assume a 4-class problem. How many neurons are required to train linear 
machine? What should be the preset structure of a neural network based 
on pairwise classification for this case? 

7. When and why will multi-class systems based on pairwise classification 
outperform the standard neural networks and decision trees? 
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Abstract: Visualization is used in data mining for the visual presentation of already dis- 

covered patterns and for discovering new patterns visually. Success in both 
tasks depends on the ability of presenting abstract patterns as simple visual 
patterns. Getting simple visualizations for complex abstract patterns is an es- 
pecially challenging problem. A new approach called inverse visualization 
(IV) is suggested for addressing the problem of visualizing complex patterns. 
The approach is based on specially designed data preprocessing. Preprocessing 
based on a transfonnation theorem is proved in this chapter. A mathematical 
formalism is derived from the Representative Measurement Theory. The pos- 
sibility of solving inverse visualization tasks is illustrated on functional non- 
linear additive dependencies. The approach is called inverse visualization be- 
cause it does not use data “as is” and does not follow the traditional sequence: 
discover pattern — > visualize pattern. The new sequence is: convert data to a 
visualizable form ^ discover patterns with predefined visualization. 

Key words: Visual data mining, simultaneous scaling, non-linear dependency, data pre 

processing, reverse visualization. 



1. INTRODUCTION 

Visual data mining is a growing area of research and applications [Keim, 
2001; Fayyad, Grinstein & Wierse, 2001; Spence, 2001; Ware, 2000; Mille, 
2001; Soukup & Davidson, 2002]. It includes two visually related tasks: the 
visual discovery of patterns and the visualization of discovered patterns in a 
specific form. This visual form should be perceivable, understandable and 
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interpretable in a domain. It is often formulated that the visualization should 
be simple - that is judged afterwards informally. 

The most promising visual discovery approach is a heterogeneous ap- 
proach that combines (a) analytical manipulation (AM) with data to trans- 
form them and (b) pure visual discovery (VD) by interactively observing 
transformed data. 

In this chapter we attempt to formalize the concept of a simple visualiza- 
tion for data mining using ideas from classical physics, specifically from the 
theory of physical structures [Kulakov, 1971]. The next goal is to develop 
an AM technique that can provide a simple visualization. The chapter con- 
cludes with a simulation example showing the application of the AM tech- 
nique on data. 

Traditional visualization generally follows the sequence: 

<pattems> — > <visualization>. 

In contrast, visual discovery reverses the sequence: 

<visualization> — > <pattems>. 

That is, in visualization, we start from patterns and produce a visualization 
while in visual discovery, we start from visualization and produce patterns. 

Thus, we call visual discovery an Inverse Visualization Task (IVT) 
with the goal to find data transformations that permit the generation of a 
simple, clear visualization of data and patterns. Success in this endeavor 
depends on the properties of the data transformations and the data mining 
methods involved. Many data mining practitioners share the opinion that 
practically any data mining method will discover meaningful patterns for 
“good” data while few if any will produce meaningful patterns for “poor” 
data. 

One of the goals of IVT is to transform “poor” data into “good” data thus 
permitting a wide variety of data mining methods to be used for the success- 
ful discovery of hidden patterns. 

For now, we will not attempt to define formally “poor” and “good” data. 
Rather, we will show that in classical physics such transformations have 
been used successfully for a long time to discover patterns, which are now 
classical (fundamental) physical laws, without formal definitions of “good” 
data. We note that the laws of classical physics are simple so the problem of 
their visualization is not difficult. 

However, the lessons learned from classical physics can help in other 
domains where patterns do not appear to be simple, but first we need to 
understand the reasons for the simplicity of laws in classical physics. Are 
these reasons specific to physics or can they be exploited for domains such 
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reasons specific to physics or can they be exploited for domains such as fi- 
nance, medicine, remote sensing, and image analysis? 

An explanation of simplicity in physics follows from two theories: the 
Representative Measurement Theory [Krantz, Luce, Suppes & Tversky 
1990] and the Physical Structures Theory [Kulakov, 1971; Mikhailichenko, 
1985]. Measurement theory [Krantz et ah, 1990, v.l] demonstrates that a 
system of physical quantities and fundamental laws will have a simple repre- 
sentation because they are obtained through a procedure that simultaneously 
scales the variables involved in the laws. 

Traditionally data mining does not involve simultaneous scaling. Note 
that simultaneous scaling is different from the data normalization procedures 
used in neural networks to speed up search, see for example [Rao & Rao, 
1995]. The typical normalization in neural networks transforms the scale of 
each variable independently and non-linearly to some interval, such as [-1,1]. 
On the other hand, simultaneous scaling of variables v, y and z might trans- 
form these variables into new scales x’,y’ and z ’ so that the law has the sim- 
ple linear form, perhaps y’= x ’+z ’. In general, laws of classical physics 
show that if all variables included in a law are scaled simultaneously then 
the law can assume a relatively simple form. 

The problem of finding efficient, simultaneous scaling transformations 
was not posed and solved by Representative Measurement Theory. This the- 
ory explains the simplicity effect but lacks a constructive way to achieve it. 
On the other hand. Representative Measurement Theory has wider area of 
application than physics only. For instance, psychology has benefited sig- 
nificantly from it [Krantz et ah, 1971]. This observation raises a hope that 
simultaneous scaling will be beneficial in other areas too. This, of course, 
requires designing simultaneous scaling transformations. 

Fortunately, the theory of Physical Structures provides an answer for this 
problem via the constructive classification of all functional expressions of all 
possible fundamental physical laws [Mikhailichenko, 1985]. Classes defined 
by this classification have an important property — any other functional ex- 
pressions of a physical law can be transformed to one of the given classes by 
a monotone transformation of all involved variables. 

The procedure for deriving such transformation is the simultaneous scal- 
ing of these variables. This result shows, that every physical law can be de- 
scribed as class of expressions that can be converted to each other by mono- 
tone transformations of the variables contained in the law. This means that 
all laws can be enumerated in the classification from of all functional ex- 
pressions of all possible fundamental physical laws [Mikhailichenko, 1985]. 
All laws of this classification have simple form and by extension, the prob- 
lem of their visualization is simple too. All complexity of visualization of 
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these laws is thus converted into the design of a monotone transformation 
of the n-tuples of variables involved. 



2. DEFINITIONS 



Let us define a class of functions F, which can be transformed to the lin- 
ear function y = x + zhy monotone transformations. There are many possible 
ways to define the class F. It is convenient to assume that F is given through 
a set of axioms. Suppose that a dataset D from a specific domain (e.g., fi- 
nance) represents a set of triples x, y, and z where y =J(x, z) and the function 
/ is not known analytically. Suppose that the function / is known only 
through tabulated values from D and possibly some meaningful (for the do- 
main) properties in the form of axioms. We assume that: 

• real-world variables x, y and z are mapped to real numbers R by some 
measurement procedures, 

• the order relation on R is not just a numeric relation but it has inter- 
pretation as a real-world relation for the variable y, and 

• in the same way the equality relation on R is interpreted for variables 
X and z. 

We define the class F of functions/ e F on Xy<Zf, where Xf czR,Xf^ 0, 
and Zf (z R, Zf^ 0. Functions from F satisfy the five properties of additive 
conjoint structure [Krantz et ah, 1971, p.256]: 

(1) . V Z;, Z 2 , 3 X ( /(X, Z/) > /(X, Z 2 ) ^ V x' (/(x', Z;) > /(x', Z 2 )) ) 

(2) . V Xi, X2, Xi, Zi, Z2, Zi 

((/(Xy, Zj) = f{X2, Zi)) & (f(Xi, Zj) = f{X3, Z/)) ^ (/(x^, Z 5 ) = /(Xj, Z^)) 

(3) . For any three of Xi, X 2 , zi, Z 2 the fourth of them exists such that 

/(Xi, Z2) = /(X2, Zi) 

( 4 ) . 3 Xi, X2, Z (/(Xi, Z) ^ /(X2, z)) 

(5) . For any zi, Z 2 : zi Z 2 , if a sequence xi, X 2 , . . . , Xi, ... of elements ofXf 

is determined and satisfies the following properties: V/, x, < x^ax 

/(Xi, Zi) = /(X2, Z2), /(X2, Zi) = /(X3, Z2), /(X3, Zj) = /(X4, Z2), . . . , 
f(Xi, Zi) f(X(i2-l), ^ 2 ), . . ■ 

then this sequence is finite. 
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In addition properties (2) and (3) should also take place with x replaced by z 
and vice versa. 



3. THEOREM ON SIMULTANEOUS SCALING 



The theorem below is based on axioms (l)-(5) and is used for design of a 
simultaneous scaling procedure. 

Theorem [Krantz et al, 1971, p.257]: 

1. For any function / e F there are one-to-one functions (px, tpz and a 
monotone function (p such that 

cp/(x, z) = (Px(x) + CPz(z), <X, Z> G T/XZ/,. 

2. If (p'(-x), tp'(z) are two other functions with the same property, then 
there exist constants a > 0, (3i, and P 2 , such that 

(p'x(x) = cc(px(x) + (3i, (pz(z) = acp'zCz) + P 2 

fix', z') = (p/((px(x'), (px(z')) 

where (p is a strictly monotone function, and cpx, (Pz are one-to-one 
functions from F. 

Proof [Krantz et al., 1971, p.264-266]. The proof follows from the fact, that 
the set axioms (1) - (5) represents an additive conjoint structure. 

Let function / g F onX/xZy, satisfy the axioms (l)-(5). By virtue of the 
axiom (4) there are points on the plane <Xo, zo>, and <X;, zo> such that 

/( Xo, Zo) * fi Xi, z„) 

(see Figure 1). 

Rescaling algorithm. Let's simultaneously scale X, Z, and Y (y = f(x, z)) 
as follows: 

• assign value 0 to xq of the scale X; record it as xg = 0; 

• assign value 1 to X/; 

• assign value 0 to zg of scale Z; 

• assign values f(xo, zg) = 0 and f(xi, zg) = 1 for function /. 

By virtue of the axiom (3) for three elements xg, zg, x/ there exists fourth 
z/, such that 



fi Xg, Z]) = fi Xi, Zg). 
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Let us link the points <Xo, zi>, <xi, zg> as shown in Figure 1. Along this 
line the function has identical values. These values are the values of Y scale 
(which is not shown on the picture). It is easy to see, that these values ofv, z, 
and y satisfy the function x + z = y. We take a point <xj, zi> and assign 
value y = f(xi, zj) = 2 for this point. 

Next we again apply the axiom (3). At first we apply it to values Xi, Xo, zi 
and receive such that f(xi, Z]) = flxj, zo) and then we apply it to values Xi, 
Xo, Z] and receive Z2, such that f(xo, Z2) = f(X], zi). After that we assign value 
y = f(xo, Z2) = f(X], zj) = f(x2, Zo) = 2. Now we consider new points <X2, z{> 
and <Xi, zi>. 

To make the given construction possible for all new points <Xo, zy>, 
<X3, zo> it is necessary, that the values of the function would be identical 
f(x2, Z]) = f(x], Z2) for points <V2, z/> and <Xi, zy>. The equality j(x2, zi) = 
f(xi, Z2) follows from the axiom 2. 

Figures 2 and 3 present such transformation. The surface in Figure 2 is 
transformed to the surface in Figure 3 by the simultaneous rescaling of vari- 
ables jc, z, andy. It follows from the theorem, that if properties (l)-(5) take 
place for some variables v, y, z, then the function / e F can be converted to 
function y = x + z by rescaling variables. After this the visualization of re- 
scaled data and functiony = x + z is obvious (see Figure 3). 

The rescaling algorithm requires that values of a function / on specific 
pairs of values <x, z> satisfy properties (l)-(5) of the theorem. These proper- 
ties are true for preference relations used in Decision Theory [Keeney & 
Raiffa, 1976], but this is not a universally true condition for other tasks. 
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Figure 2. Data visualization: (a) original data, (b) simultaneously rescaled data. 
See also color plates. 



4. A TEST EXAMPLE 

A test example must satisfy several requirements to be really convincing: 

1. It should contain regularities (patterns) known in advance; 

2. These regularities should have at least a hypothetically meaningful 
interpretation in the domain; 

3. These regularities should not be obvious when data is prescreened 
and visualized prior to rescaling as in Figure 2 (a). 

Table 1, shown in section 5 below, contains data to meet these require- 
ments. It is produced in the way described below: 

• Attributes a/, a 2 , aj and as, a^, aj, as, ag are created by using a random 
number generator. For instance attributes ai, as, as could model some 
basic independent indicators of product manufacturing. 

• Attribute a 4 is a sum of the first two attributes, a 4 = ai F as. 

• Attribute a/o is a target attribute, it is equal to some random monotone 
transformation F of the difference - a^, i.e., 

aio= F(a4- as) = F{ai + as - as). 

These attributes may have different interpretations. In one of them attributes 
as, as, ay, as and ag represent noise. They are random and unrelated to the 
target attribute aig. A hypothetical interpretation of the regularity F{ai + as- 
as) could be productivity or production efficiency or revenue. Attribute 
may indicate initial capital (scaled from 0 to 10), attribute as may indicate 
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quality of management (also scaled from 0 to 10) and attribute as could be a 
tax level (scaled from 0 to 10). Attributes aj and as contribute positively to 
revenue while attribute as contributes negatively. 

A relatively complex monotone transformation is motivated by an intention: 

• to solve a realistic task. In real-world tasks, if there are any hidden pat- 
terns (regularities), they are usually disguised and significantly cor- 
rupted. Experience shows that monotone regularities are common for 
many data mining tasks. 

• to show unique capabilities of the simultaneous scaling method. Tradi- 
tional methods that do not use simultaneous scaling can not discover a 
regularity corrupted by a random monotone transformation. The only 
way to do this is to analyze all interpretable order relations <i, <2, <3, ... , 
<10 for all attributes. These regularities are revealed by the simultaneous 
scaling method. 

• to find meaningful patterns, regularities, and functions. For instance, 
typically regression analysis produces functions that just interpolate data 
without meaningful interpretation in the domain. In contrast the simulta- 
neous scaling method produces meaningful regularities such as 



y a, b {a <i b & a <2 b ^ a <10 b) 



The data in Table 1 encodes the following regularities by design: 

V a, b {a >2 b & a <4 b ^ a <10 b) 

V a, b (a <2 b & a >4 b ^ a >10 b) 

V a, b (a <i b & a <2 b & a >2 b ^ a <10 b) (1) 

V a, b (a >i b & a >2 b & a <2 b ^ a >10 b) 

5. DISCOVERING SIMULTANEOUS SCALING 

The Discovery System [Kovalerchuk & Vityaev, 2000] can discover all 
monotone regularities including those shown in (1) above and are actually 
encoded in Table 1 along with random noise. When regularities (1) are dis- 
covered, a simultaneous monotone rescaling of the data can be arranged and 
the straightforward and simple visualization presented in Figure 3 below will 
be generated. 

Thus the major challenge is discovering the monotone regularities. The 
Discovery System searches sequentially for monotone regularities starting 
from simplest ones: 



\/ a, b (a <i b ^ a >10 b), i = 1, ..., 9. 



( 2 ) 



lining with 

ata 

5 6 7 8 9 

5 9 0 4 1 

118 15 

0 3 8 7 1 

9 1 5 



5 0 2 
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After testing them we discover regularity 



\/ a, b {a <nb^a <io b). 



with a statistical confidence level equal to 0.0001. This regularity is not in 
the list of regularities (1) above, although it is true for data from Table 1. 
Next the systems tests more complex regularities: 

V a, h (a >i h & a <i h => a <10 i,j = 1, ..., 9 



and finds a regularity 

V a, b {a b & a <4 b ^ a <io b) 

with a statistical confidence level equal to 0.025. 

Similarly, another parametric set of hypothetical regularities is generated 
and tested to discover the second regularity in (1). Next we can discover a 
regularity with all three variables in the antecedent if we substitute given 
attributes with parameters i, j and k that are the indexes of attributes. For in- 
stance, for discovering 



y a, b {a <i b & a <2 b & a >3 b ^ a <10 b). 



we generate a parametric set 



y a, b (a <i b & a <i b & a >it b ^ a <io b) i,j, k= I, ..., 9 

and test it. The test reveals the needed regularity with a confidence level 
equal to 0.1. 



6. ADDITIVE STRUCTURES IN DECISION 
MAKING 

In section 2, we assumed an additive conjoint structure that permitted us 
to build a simple linear visualization. In this section, we discuss the motiva- 
tion of using an additive structure from a decision-making viewpoint invok- 
ing an approach presented in [Keeney & Raiffa, 1976]. 

A decision-making problem is considered as a tradeoff between contra- 
dictory goals such as maximizing profit and minimizing risk. The tradeoff 
means that we try to substitute a chunk Ai of goal G\ that we cannot satisfy 
with a chunk A 2 of another goal Gj that we can satisfy. The tradeoff assump- 
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tion must be such that substitution makes sense. This is, in essence, leading 
us to an additive conjoint structure assumption. Under this widely accepted 
assumption, the practical issue is finding the chunks Ai and A 2 . 

One of the options that can be used to solve this problem is the explicit 
way where a SME (subject matter expert) declares, say, that Ai=3 and A2=5 
are equivalent for substitution purposes, that is SME formalizes preferences 
as a model. This is typically a very difficult task. Another approach is the 
implicit approach. In this approach, we just ask a SME to define preferences 
for, say, about 100 pairs of multi-criteria decisions. 

A SME can say that a decision with attributes (ai, « 2 , •••, an) = (1, 5, ..., 
7) is better than a decision with attributes {a\, « 2 , •••, «n) = (3, 2, ..., 5). Al- 
ternatively, we may ask a SME to assign a priority to each (ai, « 2 , «n) 

alterative using a 0 to 100 percentage scale. Table 1 can be interpreted in this 
way, where aio can be viewed as a priority. 

Both implicit alternatives provide us with a partially defined a scalar pri- 
ority function, v: 

v(Xi, ...,Xn)> v(yi, . . ., y„) O (xi, . . ., Xn) >SME (Fi, . . An). 

Function v{x\, . . ., Xn) is additive if v{x\, . . ., Xn) = v\{xi) + V 2 {x 2 ) + . . . + Vn(Vn). 
There are several sets of axioms known for the relation >sme- If a set of such 
axioms is assumed to be true, then theorems can be proved that a numeric 
additive function v exists. One of these sets of axioms is presented below. 

To be able to benefit from such mathematical results, we need to be able 
to test that the axioms are satisfied for an individual task and a dataset. 

Two options are available: (1) to test axioms on a tabulated function v, 
such as that presented in Table 1, where v(a/, « 2 , ag) = aw, and (2) to test 
axioms on tabulated on SME preferences {py }, that record relations between 
different pairs of alternatives: 

Pij =1 X, >SME Xj, 

where x,- =(x;/, . . . ,X;„) and X; =(xy/, . . . ,x,„). 

Flowever, testing axioms provides only a conclusion that an additive 
function v exists, but functions vi(xi), V2(x2),...,v„(x„) would still need to be 
built. To build v, we need to apply a simultaneous scaling procedure to Xi, 
X2,..., x„. Then when the functions V](xi), V 2 (x 2 ),..., v„(xj are found we need 
to analyze what they mean. To do this, we can view a dataset using these 
functions with new axis (scales) as shown in figure 2(b). It is a linear sur- 
face. The SME can analyze how well this sum corresponds to the estimates 
of v(x], .... Xn). Thus visualization provides a simple representation of SME 
knowledge that can be communicated to another SME and tested independ- 
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ently. If a regularity v is multiplicative, v(xi, x„) = Vi(xi)v 2 (x 2 ) ... v„(x„) 
then 



that is the logarithm of v is an additive function. Thus, we can use the same 
technique for multiplicative regularities. 



The simultaneous scaling linearization and visualization procedure de- 
scribed in section 5 is fully applicable to physical structures. It is based on 
the theorem that is a described below. In addition to this theorem, another 
theorem was proved by [Vityaev, 1985], which states that every physical 
stmcture of the rank (2, 2) satisfies all axioms (l)-(5) of the additive conjoint 
structure described in section 2. 

The most general characteristic of all physical laws is that they are 
equally applicable to all objects. This fundamental property permits the deri- 
vation of the structure of physical laws by formulating fuctional equations of 
a special type and solving them. 

Let us consider two arbitrary sets of objects: set M with elements i, k, ... 
and set N with elements a, (3, ... . Let us further suppose that for each pair i 
G M, a G N is mapped to a real number ai„ g R by some experiment; that is, 
the set MxN is mapped to the matrix A = || a,„ || of such numbers. 

If sets M and N are two sets of physical objects of different type, then the 
matrix || a,„ || is the result of experiments that describe the relationship be- 
tween objects i G M and a g N. 

We will say that the physical structure of the order (r, s) is defined on 
sets M and N if a functional equation: 



In v(xi, ..., x„) = In vi(xi) + In V 2 (x 2 ) + . . . + In v„(x„). 



7. PHYSICAL STRUCTURES 




^ka? • • • ? ^ky 



= 0 



( 3 ) 




is satisfied for any r • s real numbers from matrix A = || a,„ || , 



^ip5 • • • ? 
^ka? ^kp5 • • • ? 



^qa? ^qp? • • • ? ^qy 
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that are located on the intersection of any r rows i, k,. . q and any s columns 
a, P, 

The function O must not depend on: 

1. the selection of r objects from the set Mr, where Mr = { i, k, , q } c 
M, and 

2. the selection of the s objects from the set N, Ns = { a, P, . . ., y } c N 

We assume that the function O is analytical and cannot be presented as 
superposition of other analytical functions with a smaller number of vari- 
ables. 

We will say that the functional equation (3) gives us a physical law of 
order (r, s) that is invariant relative to selection of finite sets Mr and Nj from 
sets M and N. The equation (3) is a symbolically written, infinite system of 
functional equations relative to an unknown function 0(xn, x^, ..., Xrs) of 
r • s variables and one unknown infinite matrix A = || ai„ || , which represent 
one real valued function ai„ of two nonnumeric arguments i and a from M 
and N. 

Mikhailichenko [Mikhailichenko, 1972] solved this system of equations 
and derived the analytical expressions for all possible physical laws that sat- 
isfy the equation (3). He proved a theorem stating that functions (1> and ai„ 
may have only one of the following forms: 

1. for r = 5 = 2 a,a= 

'¥( a, a) - 'F(atp) - f'(aja) + ^ ajp) = 0; 

2. for r = 4, s = 2 

ata = + ^a)/(Xi + ^a)], 

nda\ 'i'Vdp\ 1 

1 

1 “ ’ 

n«,«] 1 

for r = s> 3 

+ ... +xr'^a'"-'+xr^^a"-'). 



3. 
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'i'M 




■ 






■ 


'I'M 


'I'M ■ 





and also 

+ ... +xr'^a-'+xr' + 



oil 1 

1 <P[a,.„] natp] ... natA 

1 n«,«] na,p] ... 



1 . . . na„] 



= 0 ; 



4. for r = s + 1 > 3 

IT/-1 / 1 t i \ I £T W-2 I e ff2-/ \ 

a,„= 'r (Xi + ... +Xi 

1 ••• 

1 ••• ^ 

^ 5 

1 n«v/j] • • • n«vr] 



5. for r-s>2, except the case r = 4, s = 2 physical structures are not 
exist. 

IP- is a strictly monotone function of one variable in some vicinity; 
- is an inverse function; x, , - are independent parametrs. 



8. CONCLUSION 

Evidence that an additive conjoint structure plays a fundamental role in 
two very different fields such as multi-criteria decision making and funda- 
mental physical laws provides the basis for our belief that it also can play the 
same fundamental role in other fields. Thus, sumultaneous rescaling has a 
great potential as a major tool in visual data mining for the simplification of 
patterns to be visualized in a variety of fields. Further study will be directred 
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at providing computationally efficient simultaneous rescaling algorithms for 
multi-dimensional data. 



9. EXERCISES AND PROBLEMS 

1. Define an additive conjoint structure for/ e F onXfX WfXZf as a gener- 
alization of such structure for/e F on XfX Zf presented in section 2. 

2. Formulate an analogue of the theorem presented in section 3 for the 3-D 
function /e F on XfY. WfY.Zf . 

3. Develop a rescaling algorithm for 3-D similar to that presented in section 
3 for 2-D. 

Advanced 

4. Solve the problems in exercises 1-3 for the n-dimensional case. 

5. Formulate axiom (1) presented in section 2 for n dimensions (n > 3) and 
prove that axiom (2) presented in section 2 for n = 2 follows from 
axiom (1) for n > 3 (Lemma 14 [Krantz et ah, 1971, p.306]). 
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Abstract: This chapter describes a new technique for extracting patterns and relations 

visually from multidimensional binary data using monotone Boolean func- 
tions. Visual Data Mining has shown benefits in many areas when used with 
numerical data, but that technique is less beneficial for binary data. This prob- 
lem is especially challenging in medical applications tracked with binary 
symptoms. The proposed method relies on monotone structural relations be- 
tween Boolean vectors in the n-dimensional binary cube, E", and visualizes 
them in 2-D as chains of Boolean vectors. Actual Boolean vectors are laid out 
on this chain structure. Currently the system supports two visual forms: the 
multiple disk form (MDF) and the “Yin/Yang” form (YYF). In the MDF, 
every vector has a fixed horizontal and vertical position. In the YYF, only the 
vertical position is fixed. 

Key words: Visual Data Mining, explicit data structure. Boolean data, Monotone Boolean 

Function, Flansel Chains, Binary Hypercube. 



1. INTRODUCTION 

The goal of visual data mining (VDM) is to help a user to get a feeling 
for the data, to detect interesting knowledge, and to gain a deep visual under- 
standing of the data set [Beilken & Spenke, 1999]. One of especially impor- 
tant aspects of visual data mining is visualizing the border between patterns. 
A visual result in which the border is simple and patterns are far away from 
each other matches our intuitive concept of the pattern and serves as impor- 
tant support that the data mining result is robust and not accidental. 
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VDM methods have shown benefits in many areas when used with nu- 
merical data, but these methods do not address the specifics of binary data, 
where there is little or no variability in the visual representation of objects 
for each individual Boolean attribute. VDM is especially challenging task 
when data richness should preserved without the excessive aggregation that 
often happens with simple and intuitive presentation graphics such as bar 
charts [Keim, Hao, Dayal, & Hsu, 2002]. Another challenge is that often 
such data lack natural 3-D space and time dimensions [Groth, 1998] and in- 
stead require the visualization of an abstract feature. 

The purpose of this chapter is to develop a technique for visualizing and 
discovering patterns and relations from multidimensional binary data using 
the technique of monotone of Boolean functions, which are also reviewed at 
the end of the chapter. We begin with an analysis of the currently available 
methods of data visualization. 

Glyphs. A glyph is a 2-D or 3-D object (icon, cube, or more complex 
“Lego-type" object). Glyph or iconic visualization is an attempt to encode 
multidimensional data within the parameters of the icons, such as the shape, 
color, transparency, orientation [Ebert, Shaw, Zwa, Miller & Roberts, 1996; 
Post, van Walsum, Post & Silver, 1995; Ribarsky, Ayers, Eble & Mukherja, 
1994]. 

Typically, glyphs can visualize up to nine attributes (three positions x, y, 
and z; three size dimensions; color; opacity; and shape). Texture can add 
more dimensions. Shapes of the glyphs are studied in [Shaw, Hall, Blahut, 
Ebert & Roberts, 1999], where it was concluded that with large super- 
ellipses, about 22 separate shapes can be distinguished on the average. An 
overview of multivariate glyphs is presented in [Ward, 2002]. This overview 
includes a taxonomy of glyph placement strategies and guidelines for devel- 
oping such a visualization. Some glyph methods use data dimensions as po- 
sitional attributes to place glyphs; other methods place glyphs using implicit 
or explicit structure within the data set. 

From our viewpoint, the placement based on the use of data structure is a 
promising approach. We believe that the placement of glyphs on a data 
structure is a way to increase the data dimensions that can be visualized. We 
call this the GPDS approach (Glyph Placement on a Data Structure). It is 
important to notice that in this approach, some attributes are implicitly en- 
coded in the data structure while others are explicitly encoded in the glyph. 
Thus, if the structure carries ten attributes and a glyph carries nine attributes, 
we can encode a total of nineteen attributes. The number of glyphs that can 
be visualized is relatively limited because of possible glyph overlap and oc- 
clusion. 

Spiral Bar and others techniques. Alternative techniques such as Gen- 
eralized Spiral and Pixel Bar Chart are developed in [Keim, Hao, Dayal & 
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Hsu, 2002]. These techniques work with large data sets without overlapping, 
but only with a few attributes (these range from a single attribute to perhaps 
four to six attributes). Another set of visualization methods, known as Scat- 
ter, Splat, Map, Tree, and Evidence Visualizer, are implemented in MineSet 
(Silicon Graphics), which permits up to eight dimensions to be shown on the 
same plot by using color, size, and animation of different objects [Last & 
Kandel, 1999]. 

Parallel coordinate techniques. This visualization [Inselberg & Dims- 
dale, 1990] can work with ten or more attributes, but suffers from record 
overlap and thus is limited to tasks with well-distinguished cluster records. 
In parallel coordinates, each vertical axis corresponds to a data attribute (v,) 
and a line connecting points on each parallel coordinate corresponds to a 
record. Figure 1 depicts vectors 

01010;11010;01110;01011;01111;11011;11111;10101;11101;10111 (1) 

Can we discover a regularity that governs the dataset in Figure 1? This fig- 
ure represent these data in parallel coordinates. It is difficult, but the regular- 
ity is a simple monotone Boolean function (x 2 & X4) v (xi & X3 & X5). This 
function is true for vectors from (1). 




Figure 1. Ten Boolean records in parallel coordinates 

Limitations. Technically, the number of dimensions is limited only by 
the screen resolution. However, this is typically too overwhelming to permit 
any real understanding of the data [Goel, 1999]. Serious limitations in tradi- 
tional visualization techniques are summarized in [Last & Kandel, 1999]: 

• The subjectivity of the visual representation can cause different con- 
clusions looking to the same data. 
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• The poor scalability of visual data analysis can fail when represent- 
ing hundreds of attributes. 

• Humans unable to perceive more than six to eight dimensions on the 
same graph. 

• The slow speed of manual interactive examination of the multi- 
dimensional, multi-color charts is a drawback. 

We are interested developing a technique that can work with ten or more 
Boolean attributes. Many data mining problems can be encoded using Boo- 
lean vectors, where each record is a set of binary values {0; 1} and each re- 
cord belongs to one of two classes (categories) that are also encoded as 0 and 
1. For instance, a patient can be represented as a Boolean vector of symp- 
toms along with an indication of the diagnostic class (e.g., benign or malig- 
nant tumor) [Kovalerchuk, Vityaev & Ruiz, 2000, 2001]. 

For n-dimensional Boolean attributes, traditional glyph-based visualiza- 
tions are useful but somewhat limited. Attributes of a Boolean vector can be 
encoded in glyph lengths, widths, heights, and other parameters. There are 
only two values for the length, width, and other parameters for each Boolean 
vector. Thus, there is not much variability in visual representation of objects. 
When plotted as nodes in a 3-D binary cube, many objects will not be visu- 
ally separated. 

The approach and methods described below do not follow the traditional 
glyph approaches that would put n-dimensional Boolean vectors (n > 3) into 
3-D space, making them barely distinguishable. The methods rely on mono- 
tone structural relations between Boolean vectors in the ^-dimensional 
binary cube, E”. Data are visualized in 2-D as chains of Boolean vectors. 
Currently, the system supports two visual forms: the Multiple Disk Form 
(MDF) and the “Yin Yang” Form (YYF). 



2. A METHOD FOR VISUALIZING DATA 

The numeric order of data layout. Consider a set of ^-dimensional 
Boolean vectors Fbe given similar to those shown in (1) above. Every n- 
dimensional Boolean data set V can be encoded as a Boolean function in a 
disjunctive normal form (DNF) or conjunctive normal form (CNF). Thus, 
visualization of a Boolean data set is equivalent to visualization of a Boo- 
lean function. Next, every Boolean function can be decomposed into a set of 
monotone Boolean functions [Kovalerchuk et al., 1996]. 

The monotone stmcture is important for the data mining tasks, because 
most of data mining methods are based on the hypothesis of local compact- 
ness: if two objects have similar features, then they belong to the same class. 
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In this chapter, we assume that the data satisfy the property of 
monotonicity, that is, if vector a belongs to class 1, then a vector b that is 
greater than or equal to a also belongs to class 1. The formal definition of 
monotonicity is given in section 6. Informally, monotonicity means that if a 
patient with symptoms a has a malignant tumor then another person with 
symptoms b that include all a symptoms and some additional symptoms 
most likely also has a malignant tumor. 

Below we describe the structure used to allocate these Boolean vectors in 
2-D rendering. Boolean vectors are ordered vertically by their Boolean norm 
(sum of “l”s) with the largest vectors are rendered on the top, starting from 
( 11111 ). 

Our intent is to visualize the border between the two classes in a clear 
way. Each vector will be first placed in the view and then drawn as colored 
bar: white for the 0 class, black for the 1 class. 

Figure 2 depicts the hierarchy of levels of Boolean vectors with « = 10, 
that is 11 levels from 0 to 10, where level 0 and 10 contain vectors (00000 
00000) and (11111 11111), respectively. Several vectors are shown in Figure 
2 as small bars. The bar on the top row represents vector (11111 11111). The 
vector on the third row can not be identified from this figure without addi- 
tional assumptions, but we call tell for sure that it has eight “l”s, because of 
its location on the third line. To be able to identify in a more specific way the 
location of the “0”s in this vector, we need to assume that the vertical posi- 
tions (columns) are associated with specific numbers. 

For instance, we can assume that all vectors with the same norm are or- 
dered, where the leftmost column can be reserved for the vector (11111 
1 1 100) and the rightmost position can be reserved for vector (001 11 11111) 
in the third row. 




11 



Figure 2. Level hierarchy of Boolean vectors 
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Other vectors can occupy fixed positions in between based on their nu- 
meric value (binary/decimal) where Boolean vectors are interpreted as num- 
bers with the low-order bits located on the right. In this case, since the verti- 
cal position is centered, the vector in row 3 is an average of binary numbers 
(not vectors) 00111 IIIII 2 and 11111 IIIOO 2 and that can be computed. 
Here the subscript 2 indicates a binary number. We call this visualization a 
Table Form Visualization (TFV). 

Note that for n = 10 the maximum number of vectors on each row is 252 
that is the number of combinations for five “l”s out of ten. Consider a dis- 
play having a horizontal screen resolution of 1024 = 2'° pixels, we can use 4 
pixels per bar and can easily visualize 10-dimensional space in 2-D as it is 
shown in Figure 2. 

Using only one pixel per bar we can accommodate 12 binary dimensions 
since we would have 924 combinations for choosing six “l”s out of 12 and 
that is still less then 1024. With a higher resolution screen and/or multiple 
monitors, we could increase the dimensionality, but this has obvious limits 
of about n = 14 where 3432 pixels are needed on a row. 

Several options are available to deal with this exponentially growing di- 
mensionality. One of them is grouping, that is showing high dimensional 
data by their projections to, say, 12-D, which is much larger than traditional 
conversion to 2-D. Then methods such as principal components can be used 
with visualization of first 12 principal components instead of first two prin- 
cipal components. 

Beyond this, we need to notice that for n = 20, the number of elements in 
the space is = 1,048,576. If a dataset contains 8192 = 2'^ vectors, then 
they would occupy no more than 2'^/2^° = 2'^ = 1/128 fraction of the total 
space, that is less than 1%. This means that a visualization like that shown in 
Figure 2 may not use 99% of the screen. Thus, the visualization columns that 
are not used can be reduced in order to enable the visualization of vectors 
with « = 20 or more. 

The visualization shown in Figure 3 is a modification of the visualization 
shown in Figure 2, where all repeating vectors are deleted. In Figure 2, the 
top and bottom vectors are repeated 252 times. Each level in Figures 3 is 
called a disk and the entire visualization is called the multiple disk form 
(MDF). In the MDF form, every vector has a fixed horizontal and vertical 
position as shown in Figure 3. 

3. METHODS FOR VISUAL DATA COMPARISON 

In sections 3 and 4, we detail procedures for MDF that successively pro- 
vide an increased ability for making visual comparisons of Boolean vectors 
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chosen from a set of n-dimensional Boolean vectors V. Examples of applica- 
tions of these procedures are provided in section 5. 




Figure 3. Multiple disk visualization 



Location-based procedure. The Boolean vectors are placed to the disk 
by their level as explained in the previous section. Specifically, vectors are 
placed horizontally inside the appropriate disk using a placement procedure 
based on natural numerical order. Each binary vector is converted to its 
decimal equivalent. For instance, the decimal equivalent of the vector 
0000000010 would be 2\q. Each vector is then placed horizontally in its disk 
based its value with value 0 being on the right side. Call this procedure Pj. 

The advantage of this procedure is to allow the user to compare more 
than one binary dataset or Boolean function at a time. This is possible be- 
cause each vector always is assigned the same fixed location of the disks 
based on its value. If two data sets or Boolean functions are equal then they 
have exactly the same layout. 

Procedure Pj produces the same border scheme for different Boolean 
functions and datasets, a condition that is necessary for direct comparison. 
In general, direct comparisons are not always easy. Moreover, often the goal 
is finding the differences between functions rather than similarities. If two 
datasets or functions are different then their difference will be revealed by 
their layout on the on disks. 

Chain-based procedure. Pi can only be used for comparing two data 
sets or functions, but does not really visualize any border or structure of the 
Boolean function; therefore, we have created the second procedure Pi- This 
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procedure for MDF is based on the decomposition of the binary cube, E”, 
into chain s . 

For instance, the vectors given in (1) above form several natural chains as 
shown in Table 1 where the gray elements appear in several chains. Chain 1 
contains four elements starting from (01010) and ending up with (11111). 
Each following element is greater than previous one, that is, some 0 posi- 
tions are exchanged for I’s as in (01010) and (11010) where the first posi- 
tion is changed to 1. We also note that each vector in the second row has two 
I’s. Similarly each vector in third and forth rows has three or four I’s. As 
previously noted such numbers indicate the level of the Boolean veetor and 
are also referred to as its norm. 



Table 1. Boolean data chains 



Chain 1 


Chain 2 


Chain 3 


Chain 4 


Chain 5 


01010 


01010 








11010 


01011 


OHIO 


10101 




11011 


01111 


01111 


11101 


10111 


mil 


mil 


mil 


mil 


mil 



While vectors on the same chain are ordered, vectors on the different 
chains may not be ordered. There is only a partial order on Boolean vectors. 

The partial order is defined as follows: vector a = (aj, a 2 , a„) is 

greater or equal to vector b = (bj, b 2 , b„) if for every i = 1, 2, ..., n; a, > 

bi. This partial order means that chains may overlap as shown in Table 1. We 
will use the notation a>b if Boolean vector a is greater than or equal to Boo- 
lean vector b. A set of vectors v;, V2, v„ is called a chain if 

V,>V2>...> v„. 

We focus special chains of vectors called Hansel chains, which are de- 
scribed in section 6. The Hansel chains are computed and then aligned verti- 
cally. Procedure P 2 applied to MDF moves vectors with regard to Hansel 
chains. First Hansel chains for vectors of size (dimension) n are computed, 
and then every vector belonging to the chain will be moved to align the 
Hansel Chain vertically. Hansel chains have different lengths with possible 
values from 1 to n elements. To keep the integrity of the MDF structure, we 
have to place these chains so that no elements fall out of the disks. Hence, 
the longest chain will be placed on the center of the disk and the others 
chains will be placed alternatively to the right and left of the first chain. 
Moreover, procedure P 2 will always assign a fixed position to each vector. 
This position does not change from one dataset to another. This again allows 
direct comparison to be done between different Boolean functions. P 2 visual- 
izes a certain extent the structure of the Boolean function, but it does not 
really visualize the border between classes. 
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4. A METHOD FOR VISUALIZING PATTERN 
BORDERS 



The goal of this VDM method is to show patterns in a simple visual 
form. If cases of two classes (say, benign and malignant) are separated in the 
visual space and the border between classes is simple, then the goal of VDM 
is reached. We want to reveal such border visually using the technique of 
monotone Boolean functions. 

Procedures Pi and P 2 produce a border that can be very complex and, 
thus, not easy to interpret as Figures 4 and 5 indicate. These figures illustrate 
the methods for visualizing a data set that can be described by a simple Boo- 
lean function fxi,X2, ...,Xio) =Xi. Figure 4 shows the system placing elements 
at a fixed place using MDF and Pi. Because elements have a fixed place, this 
procedure permits the comparison between multiple functions. For example, 
we could superimpose another function in the same disks with a different 
color for the bars or we could place it side-by-side in the second panel. 




Figure 4 . A MDF using procedure Pi withy(x;,X2, .... x;o) = x; 




Figure 5 . A MDF using procedure P2 withy(x;,X2, .... X;n) = X; 
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Next, the procedure P 2 unveils parts of the structure of the Boolean func- 
tion, see Figure 5. Recall, P 2 still has a fixed place for each Boolean vector. 
Flence, it still permits the comparison of multiple functions. Flowever as 
Figure 5 shows, the border visualized can be very complex. 

Procedure P3. Flence, we introduce a third procedure, P3, for MDF. This 
procedure tries to move all Flansel chains to the center of the disk. It is based 
on: the level of the first “1” value in each chain for a given Boolean func- 
tion, and the requirement that the disk architecture should be preserved. In 
this way, two different functions will produce distinct visualizations. Figure 
6 demonstrates that the results of P 3 more easily visualize the border between 
the two classes. 




Figure 6. A MDF using procedure P3 withy(x;,X2, .... xjo) = xj 

Here, the concept of the first “1” on the chain means that chain may con- 
tain elements from both classes, say benign (class “0”), and malignant (class 
“1”). Procedure P 3 is a derivative of P 2 . After computing and placing the 
vectors using P 2 , every Hansel chain will be given a value / equal to the level 
of the first 1 value present within the chain. Next, every Hansel chain will be 
moved so that the chain with the highest I value is located in the center of the 
disk so that the MDF structure is kept. Using this procedure, we are able to 
group classes within the MDF. Nevertheless to keep the MDF structure, the 
chains have to be placed in a position related to their length. Unfortunately, 
this potentially introduces a complex border between classes because of a 
possible gap between groups of vectors within the same class. 

Procedure P4. Using P3 makes the border obvious, but the border is still 
divided into several pieces because of the structure of the MDF itself We 
can see the gaps between the black parts. However, each white element 
placed on the top (belonging to class 0) actually expands to an element of the 
class. Therefore, the border should be visualized as continuous rather than 
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interrupted. Since the borders produced by P 3 can still be complex and thus 
difficult to interpret visually, we introduce the YYF structure along with 
procedure P4. 

In the YYF, the movement of all chains is based on only the level of the 
first 1 in each chain. Additionally, this structure allows filling the gaps be- 
tween the groups. The set of chains is sorted from left to right according to 
the indicated level and providing a clear, simple border between the two 
classes of Boolean vectors. Procedure P4 extends every Hansel chains cre- 
ated before placing them according to the same method used in P 3 . The first 
step consists in extending the Hansel chain with elements extended in rela- 
tion to the edge elements both up and down. The YYF structure shows this 
continuous border build using Procedure P4, see Figure 7. 




Figure 7 . A YYF using procedure P4 with/(^;,X2, ..., x/o) = x; 



Edge elements are Boolean vectors that form the border between two 
classes on the chain. To extend a chain up, we try to find the first vector 
belonging to class 1 above the edge element. That is, given the edge vector x 
we look for a vector y where j > jc, if no such y is found on the level just 
above the x level, we then add a vector z from class 0 so that z>x and so 
that the path to the first vector j > z of class 1 is minimized. 

We repeat these steps until we find a vector j from the 1 class and add it 
to the chain. To expand a chain down, we apply the same steps reversing the 
relation j > x and swapping the classes 0 and 1. Using this procedure, we 
duplicate some of the vectors which would display them more than once. 
This is justified because we keep a consistent relative relationship between 
the vectors. 

Once the chains are expanded, a value I will be assigned to each chain, 
just as in the procedure P3. Then, the chains will be sorted with regard to this 
value. This approach visualizes a simple border between classes 0 and 1. In 
our attempt to visualize the borders between the two classes of elements, we 
moved the data out of the MDF stmcture. In the YYF vectors are ordered 
vertically in the same way as in the MDF but they are not centered anymore. 
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All vectors are moved with regard to the data in order to visualize to the bor- 
der between the two classes. In this way, a clear border will appear, class 1 
being above class 0 thus giving the YYF the “Yin Yang" -like shape © .that 
responsible for its name. 



5. EXPERIMENT WITH A BOOLEAN DATA SET 

This experiment was conducted with breast cancer data that included 
about 100 cases with an almost equal number of benign and malignant re- 
sults. Each case was described by 10 binary characteristics (referred to as 
“symptoms” for short) retrieved from mammographic X-ray images 
[Kovalerchuk, Vityaev & Ruiz, 2001]. The goal of experiment was to check 
the monotonicity of this data set which is important from both radiological 
and visualization viewpoints. Figure 8 shows initial visualization where 
cases are aligned as they were in Figure 4; that is, by allocating cases as bars 
at fixed places using MDF and Pj. As we mentioned in section 5, this proce- 
dure permits the comparison of multiple functions and data sets. Comparison 
of Figures 4 and 8 immediately reveals their difference. Figure 8 presents the 
layered distribution of malignant and benign cases visualized as bars, where 
white bars represent the benign cases and black bars represent the malignant 
cases. 




Figure 8. Breast cancer cases based on characteristics of X-ray images visualized using fixed 

location procedure Pi 
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All cases in the same layer have the same number of cancer positive 
symptoms, but the symptoms themselves can be different. Light grey areas 
indicate monotonic expansion of benign cases to lower layers for each be- 
nign case and dark grey areas indicate monotonic expansion of malignant 
cases to upper layers for each malignant case. Figures 9, 10 and 11 are modi- 
fications of Figure 8 based on procedures P2 and P3. 

1^ File Edit View Window Help _ o’ X 




Figure 9. Breast cancer cases visualized using procedure P 2 

Benign cases are lined up monotonically. That is, each benign case below 
a given benign case contains only a part of its positive cancer characteristics. 
Similarly malignant cases (bars) are also lined up monotonically. Thus, a 
malignant case above a given malignant case contains more positive cancer 
characteristics than the given malignant case. The vertical lines (chains) that 
contain both benign and malignant cases are most interesting for further 
analysis as we shall see in Figure 11. Figure 11 is a simple modification of 
Figure 10 where the cases or areas around the bars are separated by frames. 
Figure 12 shows a fragment of the chain from Figure 11 that is rotated 90°. 
This chain demonstrates a violation of monotonicity, where after a benign 
(white) case on the layer 7 we have a “malignant” (grey) case on the layer 6 
that was obtained via monotonic expansion. It also shows how narrow the 
border is between benign and by two malignant (black) cases on layers 8 and 
9 with out any gap. An actual benign case on layer 7 is immediately fol- 
lowed by two actual malignant (black) cases on layers 8 and 9 with out any 

gap- 
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Figure 10. Breast cancer cases visualized using procedure P 3 




Figure 11. Breast cancer cases visualized using procedure P 3 with cases shown as bars with 

frames. See also color plates. 
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1, malignant 1, malignant 0, benign “1" malignant” 



Layer 9 Layer 8 Layer 7 Layer 6 

Figure 12. Fragment of individual chain with violated monotonicity 

Figures 8-12 reveal that there are inconsistencies with mono tonicity for 
several of the cases. Note, here cases are shown without frames and several 
bars may form continuous areas filled with the same color. 

The “white” case is an inconsistent case if there are black and dark grey 
bars (areas) above and below it. Similarly the “black” case is inconsistent if 
there are white and light grey cases above and below it. Both types of incon- 
sistent cases are presented in Figure 9. We will call them white and black 
inconsistencies, respectively. 

This visualization permits us building different monotone Boolean func- 
tions interactively and visually for situations with inconsistencies. The first 
way to do this is to find all white inconsistencies and convert all elements 
below them to white bars. This process is called a white precedence 
monotonization. Similarly, we can use a black precedence monotonization 
that converts all white and light grey elements above inconsistent black 
cases to black. 

If we use the black precedence, then such monotone expansion will cover 
1 00% of the malignant cases. This means that we will have some false posi- 
tive cases (benign cases diagnosed as malignant), which of course is better 
than having a false negative (cancer cases diagnosed as benign). The latter 
occurs when we give precedence to monotonic expansion of benign cases 
(white precedence monotonization). If the black case monotonization pro- 
duces too many false positives we may check the sufficiency of the parame- 
ters used. The violation of monotonicity may indicate that more parameters 
(beyond 10 parameters used) are needed. The advantage of described ap- 
proach is that we build a visual diagnostic function that is very intuitive. 
Inconsistencies can be analyzed globally followed by pulling up inconsistent 
cases for further analysis as shown in Figure 12. 

Two 3-D versions of the Monotone Boolean Visnal Diseovery (MBVD) 
method (programmed by Logan Riggs) are illustrated in Figures 13 and 14. 
The first version uses only vertical surfaces and is quite similar to the 2-D 
versions. The second version (Figure 14) uses both vertical and horizontal 
surfaces. 3-D versions have several advantages over 2-D versions. The first 
one is the ability to increase the dimensionality n that can be visualized. It is 
done by using front, back and horizontal surfaces of disks and by grouping 
similar Hansel chains and by showing only “representative” chains in the 
global disk views (see Figure 15 for grouping illustration). More detail can 
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be provided by changing camera location, which permits one to see the back 
side of the disks combined with the semantic zoom that permits one to see 
all chains not only the “representative” ones when the camera closes up on 
the disk. 




Figure 13. A 3-D version of Monotone Boolean Visual Discovery with only vertical sur- 
face used 




Figure 14. A 3-D version of Monotone Boolean Visual Discovery with vertical and horizontal 
surfaces used. See also color plates. 
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Figure 15. A 3-D version of Monotone Boolean Visual Discovery 
with grouping Hansel chains. See also color plates. 



6. DATA STRUCTURES AND FORMAL 

DEFINITIONS 



A Boolean vector is an ordered set of n Boolean values 0 and 1. For in- 
stance, a sequence of 10 bits such as 1011100100, which would be a 10- 
dimensional Boolean vector. We can assign a set of properties to a Boolean 
vector such as its size (dimension n) and its norm. The level of a Boolean 
vector, also referred to as the Boolean norm, is the sum of the components 
of the Boolean vector. We use these norms for splitting the set of vectors 
into n + \ levels. For instance, the level of the vector 0000000000 would be 
0, whereas the level of the vector 1111111111 would be 10. 

Boolean vectors can be represented as vertices of a cube (or a hypercnbe 
if « > 3). For instance, if « = 1, the cube is formed by the two elements {0; 
1}. For n = 2, the cube is a square formed by the elements {00; 01; 10; 11}. 
Hansel chains [Hansel, 1966; Kovalerchuk, Triantaphyllou, Despande & 
Vityaev, 1996] provide away to browse a hypercnbe without overlapping. 

We build Hansel chains recursively. The Hansel chain for n=\ is the triv- 
ial segment (0; 1). To obtain the Hansel chains for the level 2, we first dupli- 
cate the Hansel chains of the level 1 by generating two identical sets 
G=(0;1), G=(0; 1) and then by adding the prefix of 0 to the first set and then 
by adding the prefix 1 to the second set. This results in two sets 
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£™„=(00;01), £„o,= (10; 11). 

We then cut the maximum element of E^ax and add it to Emm- Thus, the 
Hansel chains for the size 2 (dimension n=2) are 



By repeating those operations of duplicating and cutting for n=3, n=4 
and so on, we will be able to build the Hansel chains for any size vectors. 

A Boolean flinction/is monotone if for any two Boolean vectors x = (xj, 
X 2 , ... xs) and y = (yi, y 2 , ...,y») such that x precedes y, that is V ie {1, n} 
X, > then fy) >fix). Such functions divide the set of Boolean vectors into 
two classes: vectors assigned to the value 0 and vectors assigned to the value 
1 , thus forming a border between the two classes. 

The number of vectors on level I can be obtained by the formula 



10 elements on the level 1. Note the maximum number of elements is 
reached on the level n!2 with 252 elements for n = 10 (Cfg =252). The 

number of elements increases from the level 0 to the level n!2 and decreases 
to the level n. 



Visual data mining had shown benefits in many areas when used with 
numerical data. However, classical VDM methods do not address the spe- 
cific needs of binary data. In this chapter we had shown how to visualize real 
patterns contained in Boolean data. The first attribute to consider while or- 
dering Boolean vectors is its norm. Using the Boolean norm of the vectors, 
we are able to split the data into n + \ groups (n being the number of ele- 
ments of vectors). Each group is assigned to a vertical position. Then, multi- 
ple methods can be used to assign the horizontal position. 

We created two different structures to handle data: MDF and YYF. In 
each structure, horizontal position of vectors is then handled by a specific 
procedure. Three procedures are specific to the MDF structure and one is 
specific to the YYF structure. 

The first procedure Pj is specific to MDF and orders vectors with regard 
to the natural order, converting a Boolean value into a numerical decimal 



{( 00 ; 01 ; 11 ), ( 10 )}. 




. For instance if « = 10 there is 1 element on the level 0, and 
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value. This procedure does not visualize the real structure of data, but, per- 
mits direct comparison between multiple functions. 

The second procedure P 2 is specific to MDF and orders vectors in order 
to visualize Hansel chains. This procedure visualizes the structure of the data 
itself However, it does not really visualize relations among data but it still 
permits a direct comparison between several Boolean functions. 

The third procedure P 3 is specific to MDF and orders the Hansel chains. 
This procedure unveils the border between the two classes of elements (0 
and 1). In order to keep the MDF structure intact, the border is visualized as 
being interrupted and thus differences can be visualized between multiple 
Boolean functions. However, in monotone Boolean functions, vectors be- 
longing to class 0 actually all expand to a vector of class 1 . Hence, the bor- 
der should be continuous. 

The last procedure, P4, is specific to YYF. This procedure visualizes the 
real border that exists between the two classes of elements. This is done by 
expanding Hansel chains up and down duplicating some elements in the 
process. This new approach has proved to be appropriate to handle discovery 
of patterns in binary data. By further developing these procedures for non- 
monotone Boolean functions and data and k-valued data structures, we be- 
lieve that the new approach can be used in variety of applications. 



8. EXERCISES AND PROBLEMS 

1 . Define Hansel chains for « = 3 using the chains for « = 2 and a recursive 
procedure described in Section 6. 

2. Define Hansel chains for « = 4 using the chains for « = 3 built in exercise 
1 and a recursive procedure described in Section 6. 

3. Build a parallel coordinate visualization of the Boolean fmci\onfxigC 2 ,Xs) 
= X]X 2 'v X 3 . Discuss clarity of this visualization as a way to separate sets of 
Boolean vectors {x} with^-'f) = 0 from vectors with^-'f) = F 

4. Draw Hansel chains for n=3 and visualize the Boolean fmci\onfixi^ 2 ,X 3 ) 

= X]X 2 V X 3 X 4 3 by marking each element of each chain with its value “1” or 
“ 0 ”. 
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Abstract: With the growing use of geospatial data arising from multiple sources com- 

bined with a variety of techniques for data generation and a variety of re- 
quested data formats, the problem of data integration poses the challenging 
task of creating a general framework for both carrying out this integration and 
decision making. This chapter features a general framework for combining 
geospatial datasets. The framework is task-driven and includes the develop- 
ment of task-specific measures, the use of a task-driven conflation agent, and 
the identification of task-related default parameters. The chapter also de- 
scribes measures of decision correctness and the visualization of decisions and 
conflict resolution by using analytical and visual conflation agents. 

Key words: Visual decision-making, geospatial data integration, conflation, task-driven 

approach, measure of correctness. 



1. INTRODUCTION 

In a computer environment, almost any two datasets can be combined 
with the hope that the result is “better” than the sum of the initial datasets. 
This is not always the case, as the appropriateness of the combination de- 
pends upon two primary factors: the quality of the input data and the task- 
specific goal of the combination process itself 

The goal of imagery registration is integrating a given image to geospa- 
tial coordinates. The goal of imagery conflation is the correlation and inte- 
gration of two or more images or geospatial databases. “The process of trans- 
ferring information (including more accurate coordinates) from one geospa- 
tial database to another is known as ‘conflation’” [FGDC, 2000]. Typically, 
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the result of the conflation is a combined image produced from two or more 
images with: (1) matched features from different images and (2) transforma- 
tions that are needed to produce a single consistent image. Registration of a 
new image can be done by conflating it with a registered image. 

Conflation has been viewed as a matching technique that fuses imagery 
data and preserves inconsistencies (e.g., inconsistencies between high and 
low resolution maps, “best map” concept, [Edwards & Simpson, 2002]). This 
approach tries to preserve the pluralism of multi-source data. The traditional 
approach [DBMS, 1998] uses an “artistic” match of elevation edges. If the 
road has a break on the borderline of two maps then a “corrected” road sec- 
tion starts at some distance from the border on both sides and connects two 
disparate lines. This new line is artistically perfect, but no real road may ex- 
ist on the ground in that location (see Figure 1). 




Figure!. Initial mismatch and “artistic” conflations 

Thus, conflation and registration are typical and important parts of geo- 
spatial decision making process that is highly visual by its nature. The de- 
mand for visual geospatial decision making is coming from ecology, geogra- 
phy, geology, archeology, urban planning, agriculture, military, intelligence, 
homeland security, disaster relief operations, rescue missions, and construc- 
tion tasks. The range of examples is abundant and includes tasks such as: 

• assessment of mobility/trafficability of an area for heavy vehicles, 

• dynamic assessing y/ooJ damage, and 

• assessing mobility in the flood area. 

Traditional cartography provided paper maps for analysis and decision- 
making in these domains. Recently there has been a massive proliferation of 
spatial data, and the traditional paper map is no longer the final product of 
cartography. In fact, the focus of cartography has shifted from map produc- 
tion to the management, combination, and presentation of spatial data. Maps 
can be (and often are) produced on-demand for any number of specialized 
purposes. Unfortunately, data are not always consistent. As such, data com- 
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bination (or conflation [Jensen, Saalfeld, Broome, Price, Ramsey & Lapine, 
2000; Cobb, Chung, Foley, Petry, Shaw & Miller, 1998; Rahimi, Cobb, Ali, 
Paprzycki & Petry, 2002]) has become a significant issue in cartography, 
where mathematical methods and advanced visual decision making must 
have a profound impact upon cartography. 

This task has a dual interest in the process of visual decision making (the 
focus of this book). First, conflation creates a base for making decisions such 
as selection of transportation routes and construction sites. Second, many 
decisions are made visually in the process of conflation itself Conflict reso- 
lution in matching features from different sources often requires visual in- 
spection of maps and imagery. Feature fi in a geospatial dataset may have 
two features ^ and fs in another dataset that appear to match//. An analyst 
using contextual visual information may resolve the ambiguity by deciding 
that // should be matched to fi only. Such visual problem solving process 
may involve computing mathematical similarity measures between features, 
analysis of their names and other non-spatial attributes as well as using the 
analyst’s tacit knowledge about the context of the task and data. 

The conflation problem is discussed in several chapters of this book. This 
chapter provides a wide overview and a conceptual framework including 
visual aspects. A task-driven approach is elaborated in Chapter 18. Chapter 
19 describes algebraic mathematical approach to conflation and a combined 
algebraic, rule rule-based approach is presented in Chapter 21. 



2. COMBINING AND RESOLVING CONFLICTS 
WITH GEOSPATIAL DATASETS 

In essence, the conflation problem is a conflict resolution decision prob- 
lem between disparate data. Inconsistencies in multisource data can be due to 
scale, resolution, compilation standards, operator license, source accuracy, 
registration, sensor characteristics, currency, temporality, or errors [Doyt- 
sher, Filin & Ezra, 2001]. Conflict resolution strategies are highly context 
and task dependent. Jensen et al. [Jensen et al., 2000] discuss the dependency 
of conflation on the task under consideration. 

In solving a conflation problem, experts are unique in extracting and us- 
ing non-formalized context and in linking it with the task at hand (e.g., find- 
ing the best route). Unfortunately, few if any contexts are explicitly formal- 
ized and generalized for use in conflating other images. It is common that the 
context of each image is unique and not recorded. For example, an expert 
conflating two specific images may match feature FI with feature F3, al- 
though the distance between features FI and F2 is smaller than the distance 
between features FI and F3. The reasoning (that is typically not recorded) 
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behind this decision could be as follows. The expert analyzed the whole im- 
age as a context for the decision. The expert noticed that both features FI 
and F3 are small road segments and are parts of much larger road systems A 
and B that are structurally similar, but features FI and F2 have no such link. 
This conclusion is very specific for a given pair of images and roads. The 
expert did not have any formal definition of structural similarity in this rea- 
soning. Thus, this expert’s reasoning may not be not sufficient for imple- 
menting an automatic conflation system. Moreover, the informal similarity 
the expert used for one pair of images can differ from the similarities the 
same expert might use for two other images. 

There are two known approaches for incorporating context: (1) formalize 
the context for each individual image and task directly and (2) generalize the 
context in the form of expert rules. Flere for the first approach, the challenge 
is that there are too many images and tasks and there currently is no unified 
technique to for context formalization. The second approach is more general 
and more feasible, but in some cases may not match a particular context and 
task, thus a human expert still needs to “take a look.” Visual decision mak- 
ing is necessary. 

Consider a simple example. Suppose that a task-specific goal may be to 
locate individual buildings at a spatial accuracy of ±20 meters. Suppose fur- 
ther that there are two spatial datasets available - one set with roads at ±5 
meters and the other set containing both roads and buildings at ±50 meters. 
Obviously, neither dataset can properly answer the question. Now suppose 
the two datasets are conflated. Is the spatial accuracy of the new image ±20 
meters or better? If so, the process is a success. If not, the users will have to 
either find new data or just accept the inaccuracies in one dataset. As a side 
note, it is important that the conflation process should not be used to simplify 
datasets (i.e., combine two datasets into one and delete the original data), but 
rather to answer specific questions. 

The conflation task includes: 

• combining geospatial data (typically represented by geospatial 
features), 

• measuring conflict in the combined data, 

• deconfiicting the combined data, and 

• testing the appropriateness of the conflation relative to the stated 
problem/task definition. 

A single common flexible framework is needed that will integrate di- 
verse types of spatial data with the following capabilities [Jensen et ah, 
2000]: (1) horizontal data integration (merging adjacent data sets), (2) verti- 
cal data integration (operations involving the overlaying of maps), (3) tem- 
poral data integration, (4) handling differences in data content, scales, meth- 
ods, standards, definitions, practices, (5) managing uncertainty and represen- 
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tation differences, (6) detecting and deal with redundancy and ambiguity of 
representation, (7) keeping some items unmatched, and (8) keeping some 
items to be matched with limited confidence. 

Several challenges have been identified in [Mark, 1999]. 

• The representational ehallenge - finding a way of merging spatial 
data from variety of sources without contradiction. Often this chal- 
lenge cannot be folly met. 

• The uncertainty challenge - finding a way of measuring, model- 
ing, and summarizing inconsistencies in merged data. Often incon- 
sistencies are inevitable in merging spatial data. 

• The visualization challenge - finding a way to visualize differ- 
ences between different digital representations and real phenomena. 




Figure 2. Framework for task driven conflation process 

Figure 2 illustrates the framework for the overall process. It includes 
three systems: 

(1) System of task-specific measures of correctness of conflation, 

(2) System of task-specific conflation methods, and 

(3) System of visualization and visual correlation tools. 
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This design of the system integrates analytical and visual problem solving 
processes. We note that different tasks may surely require different accura- 
cies and different measures of correctness of the conflation. 

The system of task-specific measures of correctness of the conflation 
serves a repository of such measures as a part of the conflation knowledge 
base. Different goals along with measures may also require different confla- 
tion methods. 

The system of task-specific conflation methods is another component of 
the conflation knowledge base. The system of visnalization and visnal 
correlation tools opens the way for visualizing conflicts in the data being 
conflated, for portraying relations, and for helping to discover relations. 

In Figure 2, a specific task. Task A, is matched to both knowledge bases 
(1) and (2) and measures and methods specific for Task A are retrieved. Then 
the specific conflation method is applied to Task A and the result is tested 
using the measure of correctness specific to Task A. If the result is appropri- 
ate for Task A, then it is visualized and delivered to the end user. Otherwise, 
the parameters of the conflation method are modified and the procedure is 
repeated until an acceptable level is achieved. 

The conflation process can use a variety of data along with metadata. The 
geometry and topology data classes that provide information on points, vec- 
tors, and stmcture are most critical data classes. 

Quality metadata include: (1) statistical information such as random er- 
ror, bias characteristics of digital terrain elevation data, and location error for 
a feature, and (2) expert-based information such as “topologically clean,” 
“well matched,” “highly contradictory,” “bias,” “consistent around poly- 
gons,” and “noticeable” (e.g., noticeable can mean edge breaks of approxi- 
mately 1 to 3 vertical units of resolution). 

Typically, data quality is assessed using measures such as accuracy, pre- 
cision, completeness, and consistency of spatial data over space, time, and 
theme. It is identified relative to the database specifications (i.e., if the data- 
base specification states that objects must be located within ±100 meters, and 
all objects are located to that accuracy, the data is 100% accurate). As such, 
appropriate use (or combination) of data is always relative to both the de- 
sired output accuracy stated in the goal and the quality of the input data. 

2.1 Rule-based and task-driven approach 

Typically the conflation process is guided by using If-Then rules. Below 
we combine a rule-based approach and a task-driven approach for the con- 
flation problem by designing a knowledge base. This knowledge base (KB) 
contains two types of rules. The first set of mles, Ra ={Rai} derived from 
user’s task. These rules match features Fi and F 2 of two images f and I 2 , 
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Rci: Fi => F2 . 

Such Rci rules are called task-driven rules. Another set of rules in KB are 
the task-free rules that are used when the user is uncertain about the goals of 
data conflation and cannot formulate a definitive goal for conflation. A sub- 
set of these rules, Tq can be tested for randomly selected potential goals and 
matched by the time user-identified task rules are ready to be fired. The rule 
below is an example of a task-free conflation rule: 

IF two candidate features // and ^ from image R are matched to the 
feature of interest /j in the second image I 2 

THEN select that feature from /; and f 2 that is closer to fi according to 
Euclidian distance between features: 

min (Dtfi.fi), Dfiji))- 

In order to contrast the two types of rules, let us generate an example of a 
task-driven rule by modifying the last rule. For instance, we can add a re- 
quirement to further test that the smaller distance derived above is less than a 
threshold acceptable for user’s taskH; specifically: 

min (Dtfi.fi), Dff 2 .fi)) < H(Af 

2.2 Conflation Methods 

Zitova and Flusser’s [Zitova & Flusser, 2003] survey of conflation meth- 
ods and classify them according to their nature {area-based and feature- 
based) and according to the four basic steps of image registration: feature 
detection, feature matching, mapping function design, and image 
transformation and resampling. This comprehensive review concludes with 
the following statement: “Although a lot of work has been done, automatic 
image registration still remains an open problem. Registration of images with 
complex nonlinear and local distortions, multimodal registration, and regis- 
tration of N-D images (where N > 2) belong to the most challenging tasks at 
this moment. . .The future development on this field could pay more atten- 
tion to the feature-based methods, where appropriate invariant and modality- 
insensitive features can provide good platform for the registration. In the fu- 
ture, the idea of an ultimate registration method, able to recognize the type of 
given task and to decide by itself about the most appropriate solution, can 
motivate the development of expert systems. They will be based on the com- 
bination of various approaches, looking for consensus of particular results.” 
Zitova & Flusser’s vision of the future reflects the process depicted in Figure 
2. 
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The intent here is not to summarize the more than 1000 papers on regis- 
tration and conflation published in the last 1 0 years nor repeat the excellent 
surveys [Zitova & Flusser, 2003; Brown, 1992], but to review some of the 
techniques developed by the image processing community for integration of 
raster and vector images relevant to this book. It is also important to note the 
significant activities of medical imaging community in image integra- 
tion/registration such as the Second International Workshop on Biomedical 
Image Registration (WBIR'03) in Philadelphia. Below we list some represen- 
tative methods from recent research on registration and spatial correspon- 
dence presented at this workshop for rigid and non-rigid image registration 
based on: vector field regularization, curvature regularization, spatio- 
temporal alignment, entropy, mutual information, variational curve match- 
ing, normalized mutual information, K-means clustering, shading correction, 
piecewise affine transformation, elastic transformation, multiple channels, 
modalities and dimensions, similarity measures, the Kullback-Leibler dis- 
tance, block-matching features, voxel class probabilities, intensity-based 2D- 
3D spines, orthogonal 2D projections. Despite similarities there are signifi- 
cant differences between medical and geospatial imagery. Medical imagery 
is produced in a more controlled environment with less dynamics and vari- 
ability in resolution but often more metadata. 

Flirose et al. [Flirose, Furuhashi, Kitamura & Araki, 2001] do not assume 
that fiducial points are known. Rather they automatically extract four corre- 
sponding points from images. These points are used to derive an affine trans- 
formation matrix, which defines a mutual position relationship between two 
consecutive range images. Such images can then be concatenated using the 
derived affine transformation. 

Wang et al. [Wang, Chun & Park, 2001] use GIS-assisted background 
registration that discerns different clutter regions in the initial image frame 
by using a feature vector composed of vertical and horizontal autocorrela- 
tion. The authors also build filters tuned to each class. In the successive 
frames, they classify each region of different clutter from a contour image 
obtained by projecting the GIS data and by registering it to the previous im- 
age. 

Finally, the reader is directed to [Bartl & Schneider, 1995] who demon- 
strate that knowledge of the geometrical relationship between images is a 
prerequisite for registration. Assuming a conformal affine transformation, 
four transformation parameters are determined on the basis of the geometri- 
cal arrangement of characteristic objects extracted from images. An algo- 
rithm is introduced that establishes a correspondence between (centers of 
gravity of) objects by building and matching so-called angle chains, a linear 
structure for representing a geometric (2D) arrangement. 
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Several other related areas that face similar challenges are image retrieval 
from multimedia databases that use image content [Shih, 2002], multimedia 
data mining [Pemer, 2002], and information extraction from heterogeneous 
sources [Ursino, 2002]. Progress in each of these areas along with progress 
in image integration should prove to be mutually beneficial. 

Pope and Theiler [Pope & Theiler, 2003] and Lofy [Lofy, 2000] apply 
image photogrammetric georeferencing using metadata about sensors com- 
bined with the edge extraction and matching at three levels of resolution. 
The last of these systems has been used to automatically register synthetic 
aperture radar (SAR), infrared (IR), and electro-optical (EO) images within a 
reported 2-pixel accuracy. 

Growe and Tonjes [Growe & Tonjes, 1997] present a rule-based ap- 
proach to imagery registration for automatic control point matching when 
flight parameters are inaccurate. Prior knowledge is used to select an appro- 
priate structure for matching, i.e. control points from a GIS database, and to 
extract their corresponding features from the sensor data. The knowledge is 
represented explicitly using semantic nets and rules. The automatic control 
point matching is demonstrated for crossroads in aerial and SAR imagery. 

A recent special issue of Computer Vision and Image Understanding 
Journal [Terzopoulos, Studholme, Staib & Goshtasby, 2003] is devoted to 
non-rigid image registration based on point matching, distance functions, 
free boundary constraints, iconic features and others. The next step in geo- 
spatial data registration is video registration [Shah & Kumar, 2003], which 
faces many of the same challenges as static image registration discussed 
above. 

The new use of algebraic invariants described in Chapter 19 is a very 
general image conflation method. For example, it can be used with digitized 
maps (e.g. in USGS/NIMA vector format), aerial photos and SRTM data 
that have no common reference points established in advance. The images 
may have different, (and often) unknown scales, rotations and accuracy. 

The method assumes that map, aerial photo or SRTM data images each 
have as least 5 well-defined linear features that can be presented as polylines 
(continuous chains of linear intervals). A feature on one image might be only 
a portion of the same feature on another image. Also features might overlap 
or have no match at all. It is further assumed that these well-defined linear 
features can be relatively easy extracted. The major steps of the method are: 
Step 1. Extract linear features as sets of points (pixels), S. 

Step 2. Interpolate these sets of points S as a specially designed/>o/v/z«e. 

Step 3. Construct a matrix P of the relation between all lengths of intervals, 
on the polyline (see below for more details). These matrixes are com- 
puted for all available polylines. 
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Step 4. Construct a matrix Q of the relation between all angles on the poly- 
line. These matrixes are computed for all available polylines. 

Step 5. Search common submatrixes in the set of matrices P and compute a 
measure of closeness. 

Step 6. Search common submatrixes in the set of matrices Q and compute a 
measure of closeness. 

Step 7. Match features using the closest submatrix. 

At first glance, it is not clear how useful a visual approach can be for al- 
gebraic, rule-based and task-driven problem solving. The algebraic approach 
described in Chapter 19 benefits from visuals by having visual spatial rela- 
tions as a source of intuitive problem understanding. It provides insight for 
discovering an algebraic formalization of human conflation process. For in- 
stance, the matrix for relations between angles of the line segments captures 
a human way of analyzing similarities between two lines by noticing that 
relations between angles are similar in both lines. This insight is used in the 
algebraic approach described in Chapter 19. 



2.2.1 Are topological invariants invariant for the conflation task? 

Recently at the NSF-flinded Workshop on Computational Topology, Bern 
et al. [Bern, Eppstein, Agarwal, Amenta, Chew, et al, 1999] identified sev- 
eral major shape identification and recognition problems. Two of these top- 
ics are most relevant to our main problem of interest - matching/correlating 
spatial objects, namely. Qualitative Geometry and Multiscale Topology and 
Topological Invariants. The reason is that often it is impossible to conflate 
maps and data while preserving all local geometrical properties. The hope is 
that some more global shape properties can be preserved, discovered, and 
used for matching spatial objects. Obviously topological properties present 
some global invariants. They may also provide a more meaningful descrip- 
tion than geometric measures [Bern et al., 1999]. Flowever, “Topological 
invariants such as Betti numbers are insensitive to scale, and do not distin- 
guish between tiny holes and large ones. Moreover, features such as pockets, 
valleys, and ridges - which are sometimes crucial in applications — are not 
usually treated as topological features at all” [Bern et al., 1999]. 

Two ideas have been generated to deal with this important problem [Bern 
et al., 1999]: 

• Using topological spaces naturally associated with a given surface to 
capture scale-dependent and qualitative geometric features. For example, 
the lengths of the shortest linking curves [Dey & Guha, 1998] or closed 
curves through or around a hole, which can be used to distinguish small 
holes from large ones. 
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• Using the topology of offset or “neighborhood” surfaces for classifying 
depressions in a surface. For example, a sinkhole with a small opening 
will seal off as the neighborhood grows, whereas a shallow puddle will 
not. 

The first idea uses highly scale-dependent, geometric characteristics such as 
length. This idea has an obvious limitation for the conflation problem. Dif- 
ferent and unknown scaling of maps, aerial photographs, and sensor data can 
distort sizes - a large hole could appear to be small, while a small hole could 
appear to be large. The second idea just shifts the same limitation to 
“neighborhood” surfaces. 

A fundamentally new mathematical approach is needed for solving the 
conflation problem. The algebraic approach described below presents such 
new approach. Algebraic and differential geometry ideas are central to this 
approach and include establishing homomorphisms and homeomorphisms 
between algebraic systems. Further, this approach solves the conflation prob- 
lem for a wide range of realistic settings. The algebraic approach introduces 
a new class of invariants which are much more robust than common geomet- 
ric characteristics such as coordinates and lengths. In addition, they detect 
specific geospatial features better than common topological invariants. Ob- 
viously common topological invariants were not designed specifically for 
solving the conflation problem. 

It was noticed in [Bern et al., 1999] that “the most useful topological in- 
variants involve homology, which defines a sequence of groups describing 
the ‘connectedness’ of a topological space. For example, the Betti numbers 
of an object embedded in are respectively the number of connected com- 
ponents separated by gaps, the number of circles surrounding tunnels, and 
the number of shells surrounding voids. Technically, the Betti numbers are 
the ranks of the free parts of the homology groups. For 2-manifolds without 
boundary, the homology can be computed quite easily by computing Euler 
characteristics and orientability.” 

The number of connected components (in the same drainage system) can 
vary significantly between different maps, aerial photographs, and sensor 
data for the same area. It depends on characteristics such as human error, 
map resolution, sensor capabilities and parameters, obstacles (e.g. clouds), 
data processing methods. Similar problems exist for other topological num- 
bers. Thus, in general, they are not invariant for the conflation problem. 

Bern et al. [Bern et al., 1999] suggested that the estimation of topological 
invariants may be appropriate if it is not possible to determine the topology 
of an object completely. Unfortunately, in the case of a drainage system this 
estimate can be far from useful. A given drainage system might have 10 or 
even 100 connected elements on some maps, but only one or two connected 
elements on another. In general, objects such as drainage systems, road sys- 
tems and/or lakes can be presented without some of their components such as 
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individual canals, roads, and islands, which can change the calculation of 
their topological invariants. Study needs to be done to determine the extent 
that Betti numbers and Euler characteristics would be useful for matching 
incompletely defined spatial objects. Similarly, the extent to which Betti 
numbers and Euler characteristics would be useful for matching spatial ob- 
jects that are defined with errors such as roads and canals with incorrect con- 
nections needs further consideration. 

2.2.2 Rubber sheeting problem: definitions 

Dey et al. [Dey, Edelsbrunner & Guha, 1999] address two important 
computational topology problems in cartography: (1) rubber sheets and (2) 
cartograms. These problems involve two purposeful deformations of a geo- 
graphic map: 

(1) bringing two maps into correspondence - (e.g. two geographic maps 
may need to be brought into correspondence so that mineral and agricul- 
tural land distributions can be shown together) and 

(2) deforming a map to reflect quantities other than geographic distance and 
area (e.g., population). 

Both tasks (1) and (2) belong to the group of problems that consist of 
matching similar features from different databases. However, there is an im- 
portant difference between them and our spatial object correlation task. 

Dey and Guha [Dey & Guha, 1998] assume that n reference matched 
points are given between the two maps for rubber sheeting: “To model this 
problem let P c M and Q c N be two sets of n points each together with a 
bijection b: P ^ Q. The construction of a homeomorphism h: M ^ N that 
agrees with b at all points of P is popularly known as rubber sheeting [Gill- 
man, 1985]. Suppose K and L are simplicial complexes whose simplices 
cover M and N: M = |K| and N = |L|. Suppose also that the points in P and 
Q are vertices of K and L: P c Vert K and Q c Vert L, and that there is a 
vertex map v: Vert K ^ Vert L that agrees with b at all points of P . The ex- 
tension of V to a simplicial map f : M ^ N is a simplicial homeomorphism 
effectively solving the rubber sheet problem.” 

Dey and Guha [Dey & Guha, 1998] also review variations of the con- 
struction of such complexes K and L. They note that [Aronov, Seidel & Sou- 
vaine, 1993] consider simply connected polygons M and N with n vertices 
each where they show there are always isomorphic complexes |K| = M and 
|L| = N with at most O(n^) vertices each. They also prove that sometimes 
£2(n^) vertices are necessary and they show how to construct the complexes 
in O(n^) time. In addition, Dey notes that [Gupta & Wenger, 1997] solve the 
same problem with at most 0(n + m log n) vertices, where m is the mini- 
mum number of extra points required in any particular problem instance. 



1 7. Imagery intergration as conflict resolution decision process 



421 



2.2.3 Related eomputational topology problems 

Below we discuss relationship between our framework and the tasks dis- 
cussed by [Bern et al., 1999]. 

Shape Reeonstruetion from Scattered Points is an important part of our 
algebraic approach. Methods for reconstructing a linear feature shape using a 
criterion of maximum of local linearity can be developed using this ap- 
proach. 

Shape Acquisition. Matching/correlating spatial objects requires shape 
acquisition from an existing physical object. We believe the solution of this 
problem depends on developing a formal mathematical definition for fea- 
tures such as roads and drainage system as complex objects in terms that co- 
incided with the USGS/NIMA Topological Vector Profile (TVP) concept. 

Shape Representation. Bern et al. [Bern et al., 1999] list several repre- 
sentation methods: unstructured collections of polygons (“polygon soup”), 
polyhedral models, subdivision surfaces, spline surfaces, implicit surfaces, 
skin surfaces, and alpha shapes. 

In light of this, we believe that an algebraic system representation such 
as that described below, which includes scale-dependent but robust invari- 
ants, axioms and permissible transformations should be developed. 

Topology Preserving Simplification. It is critical for many applications 
to be able to replace a polygonal surface with a simpler one. However, such a 
process is “notorious for introducing topological errors, which can be fatal 
for later operations” [Bern et al., 1999]. 

A generic approach developed in [Cohen, Varshney, Manocha, Turk, 
Weber, Agarwal, et al., 1996] can be used where a simplified 2-manifold is 
fitted into a shell around the original. In addition, we are developing a spe- 
cific method for the simplification of linear features critical for the algebraic 
approach using the similar ideas. 

Our goal is to enforce algebraic invariants using simplification. The 
main idea is that the simplification criteria should not be completely defined 
in advance but be adjusted using extensive simulation experiments and 
maehine learning tools to identify the practical limits of robustness of the 
algebraic invariants. 

This includes the classification of linear features to identify simplification 
options. For example, let a be a linear feature, b its simplification, X a meas- 
ure of the closeness between a and b with ^(a,b) < 8, where 8 is a limit of 
acceptable simplification. This limit can be developed as an adjustable pa- 
rameterized function of a feature a, 8 = 8(a). 
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3. MEASURES OF DECISION CORRECTNESS 

A wide range of measures of correctness of the results have been devel- 
oped, they need to be analyzed for building a common ground for geospatial 
decision making. Systems can be classified in a way that varies in their level 
of exact definition and presentation of measures of closeness and their confi- 
dence as follows: 

• High level. A system makes exact and clear measure of closeness on the 
design stage. 

• Medium level. A system does not measure closeness of spatial objects in 
advance, but provides a user with interactive tools such as the curve- 
matching cursor. 

• Low level. A system mostly relies on an informal human perceptual 
measuring mechanism, providing similar graphical presentation of enti- 
ties and some pointing mechanisms between them. 

A major concern with the first approach (referred to as the high level above) 
is in the nature of a measure of closeness as a single numeric indicator. If we 
try to catch the closeness of data sets with 1000 points in each of them, this 
measure may capture: 

• average closeness - the averaged distance between entities using some 
formal definition of the measure of closeness between individual points, 

• optimistic closeness - a measure with higher weights on smaller discrep- 
ancies in distances between entities using some formal definition of the 
measure of closeness between individual points, 

• pessimistic closeness — a measure with higher weights on larger discrep- 
ancies in distances between entities using some formal definition of the 
measure of closeness between individual points. 

Probability theory, mathematical statistics, functional analysis, fuzzy logic, 
machine learning, and pattern recognition provide plenty of examples of 
measures for all three alternatives and many intermediate measures between 
them. The problem is that each of them hides/ignores some of the distortions, 
which may be critical for a specific task. A custom design of measures of 
closeness for a specific task and spatial entities falls into another trap — loss 
of the universality of the approach and tools. In this case, there is the need 
for highly task-specialized (unique) measure designs. Successful introduc- 
tion of measures of the closeness helps to evaluate conflation process quality 
and to resolve ambiguities in matching features. 

Multidimensional measures of conflation correctness and their visual 
presentation are a way to meet these challenges. Figure 3 illustrates a 3- 
dimensional measure with three components: (1) optimistic (short arrow), (2) 
average (medium arrow) and (3) pessimistic (large arrow). We also utilize 
features such as: location, rotation, blinking, and lighting to present contra- 
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dictory characteristics of conflated maps. Figure 3 visualizes a conflation 
procedure, showing directions of movement and matching lines for moving 
the upper map to conflate the two maps. This visualization also can serve as 
a guide in manual visual conflating, when, for instance, an automatic confla- 
tion fails. 




Figure 3. Multidimensional measures of correctness 

Figure 3 uses the symbol to portray contradictory breaks in elevation 
lines. This is an example of our matching attribute approach: if two objects a 
and b contain contradictory data then a is associated with the attribute "Con- 
tradicts b" and b is associated with the attribute "Contradicts a", both attrib- 
utes are portrayed by the same symbol with the same color and this 

symbol is attached to the both objects. Similarly, if two other objects con- 
tradict each other, then the same symbol is attached to both objects. 

Contour lines match on the left side of Figure 3, but are shifted by one. 
An examination of this limited area could lead to the mistaken conclusion 
that the conflation was better than it really is. This visualization makes the 
conflation decision-making process apparent. 

To distinguish two pairs of contradictory objects, the symbol of contra- 
diction is used with an attribute of a different color for these pairs. Thus, 
matching attributes support visual correlation between contradictory objects. 
For portraying the magnitude of contradiction between spatial objects, we 
use measures of conflation correctness (including fuzzy logic measures). The 
symbol of contradiction is rotated in accordance with this measure value: the 
larger the rotation angle (up to 1 80°), the larger the contradiction. 

A conflation procedure will be called a consistent conflation procedure if 
a resulting combined map: preserves relative distances between elevation 
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lines and objects on each map preserves absolute elevations and locations on 
the border connecting maps, and avoids discontinuity of the objects on the 
border. 

In the situation when consistent conflation is impossible, some non-linear 
distorting conflation methods are used as part of USGS standard. These 
methods differ in the number of neighboring elevation profiles involved in 
interpolation. The number of profiles depends on the number of vertical reso- 
lution units where the edge breaks. For such situations, measures of correct- 
ness of conflation are compositions of: 

• measures of the distortion of relative distances on both maps, 

• measures of the distortion of absolute elevations and locations of 
objects on the edge and on the interpolated profiles, 

• measures of the discontinuity on the border, and 

• measures of the distortion of topology of objects due to compos- 
ing two maps. 

We represent each of these measures as one of the three measures forms 
described in the previous section: pessimistic, optimistic, and average meas- 
ure forms. 

Traditionally, all measures use a Euclidian separation distance between 
parts of the spatial object. For instance, the standard measure of vertical dis- 
tortion used in digital elevation models (DEM) is a root-mean-square 
(RMSE) error statistic, E of an average Euclidian closeness between eleva- 
tion data. Next two thresholds, T\ and T 2 are set up for these error statistics: 

HE ^ T\ then it is desirable to use conflated data. 

If Ti < £■ <T 2 then conflated data can be retained temporarily before 
better conflated data will be available; 

IfT2 > E then conflated data are rejected. 

For instance, the US Geological Survey provides thresholds Ti = 7 m and 
^2 = 15 m for the 7.5-minute Digital Elevation Model. These thresholds are 
not flexible and are task- independent. If Ti = 7.4 m is desirable, why then 
would £■ = 7.6 m not be desirable. In essence, the answer should depend on 
the task, for one task, it might be desirable, for another task, it might not. 
Fuzzy logic membership functions provide more flexibility in setting thresh- 
olds and accommodating task specifics. Figures 4 and 5 translate thresholds 
Ti, and T 2 into a fuzzy logic format. The first membership function might 
represent the desired RMSE. Similarly, the second and third membership 
functions data might represent temporarily retained and data rejected values 
of RSME respectively. Figure 5 may better fit a practice when there is not 
much difference between 6.9m and 7.0m RMSE in contrast with a sharp bor- 
der of less than 7.0 m as in Figure 4. 
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7.5- minute DEM elevation Root-Mean-Square Error (RMSE) 




meters 

I ■ ■ desired ^^^refa/ned temporarily iiMmamaaBa rejected | 

Figure 4. Use of fuzzy logic membership functions - sharp distinctions 

Figure 5 shows a more flexible version of these functions with wider sets 
of uncertain values between desired, retained temporarily, and rejected val- 
ues of RMSE, which can be more realistic measures in some tasks. 



Adjusted RMSE 




I ■ ■ desired ^^^^retained temporarily rejected | 

Figure 5. Use of fuzzy logic membership functions - more relaxed distinctions 

These membership functions are constructed and stored for each specific 
task in the conflation knowledgebase providing task-specific formalizations 
of concepts “desired”, “retain temporarily”, and “reject.” Different tasks may 
have different levels of desired distinction and their definitions, as we have 
illustrated in Figures 4 and 5. The value of RMSE is a property of the data 
and exists independently of the task, but its evaluation is a task-specific. A 
fuzzy logic approach also permits introducing a hierarchy of fuzzy logic 
membership functions to capture a variety of task-specific evaluations. Next, 
we consider a more general approach based on context spaces [Kovalerchuk 
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& Vityaev, 2000] that can capture a richer collection of task specifics and 
context. 



4. VISUALIZATION 



There are several stages in the problem solving process: discovering, im- 
plementing (in hardware and software), using, and presenting results. To as- 
sociate a visual process with this, all major steps should be visual. A com- 
plete visual step is performed visually, represented visually (visualized) and 
animated when considering dynamie steps. 

Most visualization activities concentrate on two stages: using a well- 
defined process for solving tasks and representing results [Mille, 2001]. 
Visualization of these stages permits speeding up the process and assures 
quality control. Examples are abundant, e.g., finding North by using stars 
such Polaris and the Big Dipper and Little Dipper stars. Animation of Py- 
thagorean Theorem proof shown in Figure 4 in Chapter 1 provides another 
example of a visual process. The first example has been very useful for solv- 
ing the navigation problem for centuries and the second example has been 
used in education for two millennia. AutoCAD provides examples of visual 
implementation stage. Much less has been done for visualizing process dis- 
covery. Typically, on this stage, visuals serve the role of an informal insight 
for process algorithm development. There are historical facts that such in- 
sights played a critical role in the whole problem solving process. Consider 
again Albert Einstein’s evidence quoted in Chapter 3. This is the most chal- 
lenging and creative stage of problem solving. Visual and spatial data min- 
ing have recently emerged as major tools for this stage. 

Modern geospatial studies form a natural domain for advanced visual de- 
cision-making techniques where typical the spatial relations between objects 
such as larger, smaller, above, below are visual by their nature. Cognitive 
science research reviewed in Chapter 3 indicates that human reasoning with 
such spatio-visual relations is more efficient than with relations that are 
only visual relations (e.g., cleaner ~ dirtier). 

As an illustration, consider two vector image data sets consisting of 1497 
line segments and 407 line segments respectively. Are these two images of 
the same scene? Figures 6 and 7 display both the data sets and the parts of 
the sets that are held in common and those that are unique. 
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Figure 6. Vector image data sets of 1497 and 407 line segments 




Figure /.Common (a) and unique (b) segments for the vector data sets in Figure 6. These 
segments were found using methods described in Chapter 19. 



The number of common polylines supports the conclusion that these are 
two images of the same scene. The number of polylines not in common illus- 
trates the need for additional information to know whether these are the re- 
sult of incomplete or faulty feature spaces, different image resolutions, or 
changes in the scene between the times of acquisition. With additional in- 
formation from the metadata associated with this vector data or perhaps from 
other sources, it may be possible to quantify the differences that this visuali- 
zation makes clear. 

A tight link between visual decision-making and conflation is clarified 
when we notice that the decision-making task provides a specific goal for 
conflation. We call such a conflation approach a task-driven (or mission 
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specific) conflation approach. This approach shows the synergy of mathe- 
matical and spatio-visual analyses in decision-making process. The task- 
driven approach intends to conflate data relative to a given decision prob- 
lem. Different tasks may set different requirements for conflation accuracy. 
Sometimes a user requests a “best possible conflation” but practice has 
shown that the “best possible conflation” might be too inconsistent for the 
specific task at hand. Alternatively, such a “best possible conflation” can 
consume critical time and effort far beyond the real need. For instance, in 
time critical applications, such as rescue operations we may not need a high 
accuracy for an area that is not in a restrictive area of operation. 

Assessment of mohility/trafficahility of the area for heavy vehicles is an 
example of the task that can benefit from task-driven approach. Assessment 
of differences in elevations and depth of water zones (both represented by z 
coordinate distribution in the area) are among critical components of this 
task. This task may be less sensitive to errors in assessing horizontal dis- 
tances in X and y coordinates than in z coordinates in each of the maps being 
conflated. For instance, a two-meter error for x and y might be acceptable 
while such a deviation might be too large for z coordinate. 

Another task could be the dynamic assessment offload damage to an area 
by combining a map of the area before flood and aerial photos taken every 
two hours since the flood began. The quantitative goal of this task is com- 
puting the area of the flood zone under water. Thus, in contrast with the mo- 
bility task, this task may need higher accuracy in the values of x and y coor- 
dinates while the z coordinate might need to be less accurate. Requirements 
for a third task of assessing mobility in the flood zone may differ from both 
of the first two tasks. It may require high accuracy for all three x, y, and z 
coordinates. 

In order to concentrate efforts for the task-driven approach, we need to 
identify specific tasks more formally. Specifically, it can be represented as a 
triple A=<G, K, D>, where G is a goal, is a domain specific knowledge 
that includes the ontology and D is the available data. The goal G for a well- 
posed task provides also a criterion, Cq, for identification that the goal of 
the task. A, has been reached, i.e., Ca(A) = 1 (true). 



5. CONFLICT RESOLUTION BY ANALYTICAL 
AND VISUAL CONFLATION AGENTS 

In this section, we demonstrate how a set of task-driven intelligent con- 
flation agents can accomplish the task-driven conflation approach. An agent 
can carry out a single task while a community of agents can carry a wide ar- 
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ray of tasks. Such agents operate with multiple feature representations and 
resolve conflation conflicts using rules or other conflict resolution strategies 
according to the task at hand. For instance, if the task is a global strategic 
overview of country's conditions, then a strategic conflation agent is acti- 
vated. If the goal is to support a local reconnaissance unit, then the system 
monitor should activate a specialized local reconnaissance conflation 
agent. 

In [Doytsher et ah, 2001; Rahimi et ah, 2002] hierarchies of spatial con- 
flation agents (CA) are suggested. The top-level agents are defined accord- 
ing to the type of matching geometric elements they use: points, lines, or ar- 
eas. Agents based on matching points are called point agents. Next point 
agents are classified in subcategories: For instance, some agents can match 
images using corner points on buildings {building agents). 

Other agents can match images using distinguishing points on roads such 
as intersections and turning points. Agents based on lines (line agents) can 
be classified further similarly as building, road, or railroad agents. For in- 
stance, a line building agent matches images using lines on buildings such as 
roof lines and agents based on area (area agents, e.g., lake agent). The rea- 
soning behind the introduction of the dynamic selection of conflation agents 
versus a static approach with a single conflation agent is the flexibility of the 
dynamic approach. The dynamic approach is task-specific; a specialized 
agent can be selected depending on the task at hand. A dynamic system can 
monitor and map discrepancies related to a specific user's task and select an 
appropriate conflation agent to resolve the problem. 

The prototype described in [Rahimi et ah, 2002] provides a set of menus 
for a user: (i) to declare the conflation tools and their parameters to be used, 
(ii) to select method for determining matched features, and (iii) to select 
method for evaluating links. Below we briefly describe some conflation 
agents [Doytsher et al., 2001]. 

The point agent (PAl) 

• selects points (nodes) as counterpart features and 

• builds a local rubber-sheeting transformations based on selected 
counterpart points, and 

• converts one map to another map by using the found transforma- 
tions. 

This agent is adequate only for cases where rubber-shitting between 
known control points does not create significant errors in matching interme- 
diate points that differ from control points. 

The line agent (LAI) conflates maps by running code that implements 
the following sequence of algorithmic steps: 

• detecting counterpart linear features. 
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• partitioning the whole map into sub-regions according to the 
network of counterpart lines, 

• transforming counterpart elements to their new positions, and 

• transforming the remaining elements within each subregion ac- 
cording to the boundary transformation. 

Chapter 19 provides another line agent (LA2) based on the algorithm called 
Algebraic BSD (algebraic sequential binary division). 

Conflation agents [Rahimi et al., 2002] fulfill several functions: 

• eliminating non-matches using simple measures that can be 
quickly computed, 

• determining potential matches using more subtle checks such as 
attribute set and value similarity, and 

• determining whether the overall matching score computed as a 
function of attribute score, geometry score, topology score and 
filter score, exceeds the threshold level. 

Topological matching approaches are based on graph theory [Lynch & 
Saalfeld, 1985] and artificial intelligence methods using if-then rules [McKe- 
own, 1987; Cobb etal., 1998]. 

The overall score approach taken in [Rahimi et ah, 2002], where individ- 
ual scores are combined in a single overall matching score has well-known 
drawbacks — it is hard to justify a specific form of the converting function. 
Different converting functions can provide different feature matches. This is 
a common problem of converting a multi-criteria task to a single-criteria 
task. Functions can follow different heuristic strategies (e.g., “optimistic”, 
“pessimistic”, or “average”) as we have discussed above. Often these choices 
are highly subjective. In the truly multi-criteria situations where different 
converting functions provide different matching results, we suggest using a 
visual conflation agent. This agent provides tools for a human expert to 
analyze the context of the situation in depth using unique human visual abili- 
ties. An attribute matching function is introduced in [Rahimi et ah, 2002]. In 
this approach, each feature object is considered as a set of attribute-value 
pairs: 

((ajif ^ji)> (aj2, ^j2}> • • • > (rijk) ^jk)> • •• » (^jn> ^jn})> 

where a,vt identifies attribute for feature fi and Vik identifies the value of 
attribute at for feature ft and terms for feature fj on the second line are de- 
fined similarly. For matching numeric attributes at, membership matching 
functions are used and for linguistic attributes similarity tables are used. The 
overall Matching Score (MS) for attributes is given by: 



MS,j=(Zt^,,N[St(fufi) xWk])/N, 
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where Sk(fi, fj) is the similarity funetion between features fi and fi for their at- 
tribute Ok, N is the number of attributes that are common to both features fi 
and f, and Wt is the weight computed by the rule-based expert system for the 
attribute For instance, a rule for eomputing weights Wk and Wm for attrib- 
utes a/c and could be: 



Ifvik = 1 & V2m= 3 & V]m= V2m= 10, then Wk= 0.8 & Wm = 0.4. 

Lakin [Lakin, 1994; Lakin 1987] described Visual Agents (VAs) as soft- 
ware entities, which assist people in performing graphical tasks, such as 
making a text-and-graphic record of a group meeting, live and on-the-fly, 
and showing it to participants during the meeting to enhanee collaboration. 
The group members can see the record as it is being made, offering sugges- 
tions and corrections. For instance, a visualization agent can act as a white- 
board assistant helping to graphieally record the conversation and concepts 
of a working group on a large display. The visualization agent can help dis- 
play on-the-fly global objectives, immediate goals, tools, factual data and 
R&D options that are been discussed by the group that is making decision on 
a business strategy. 

Similarly, a conflation visualization agent may have a complete access to 
imagery analyst’s aetions doing the conflation in collaboration with other 
analysts and software conflation agents. Thus, it can be true visual human- 
computer collaboration in problem solving. The visual agent includes com- 
putational engines for processing text-graphic activity, both static images 
resulting from the activity as well as actual moment-to-moment dynamics of 
the activity itself 



6. CONCLUSION 



Imagery integration is a conflict-resolution decision problem among dis- 
parate data with inconsistencies due to scale, resolution, compilation stan- 
dards, source accuracy, registration, and many other factors. There are rep- 
resentational challenges and others from uncertainty and how to visualize it. 
This chapter reviews several imagery integration techniques and coneludes 
that no single method is suitable for all data sets and for every task. The 
most effective approach is to create a general framework for carrying out 
image integration and develop task-speeific measures of deeision correctness 
for each process. 

It was shown that a set of task-driven intelligent conflation agents can ac- 
complish the task-driven conflation approach. An agent can carry out a sin- 
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gle task while a community of agents can carry a wide array of tasks. Such 
agents operate with multiple feature representations and resolve conflation 
conflicts using mles or other conflict resolution strategies according to the 
task at hand. 
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8. EXERCISES AND PROBLEMS 

1 . Suggest a classification of task-driven conflation rules that are conceptu- 
ally described in the section devoted to rule based- and task driven ap- 
proach. 

2. Design a framework to conflate two Digital Elevation Models (DEM) 
with different resolutions and areas of coverage. 

3. Design a way to visualize the areas of agreement and disagreement in this 
conflation and use this information to improve the conflation. 

4. Design a framework to conflate a raster satellite image and vector stream 
data with different resolutions and areas of coverage. 

5. Design a way to visualize the areas of agreement and disagreement in this 
conflation and use this information to improve the conflation. 
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Abstract: This chapter addresses imagery conflation and registration problems by pro- 

viding an Analytical and Visual Decision Framework (AVDF). This frame- 
work recognizes that pure analytical methods are not sufficient for integrating 
images. Conflation refers to a process similar but more complex than what is 
traditionally called registration, in the sense that there is, at least, some con- 
flicting information, which predates it and post conflation evaluation that 
postdates it. The conflation process studies the cases of two or more data 
sources where each has inaccuracy and none of them is perfect. The chapter 
covers complexity space, conflation levels, error structure analysis, and a 
rules-based conflation scenario. Without AVDF, the mapping between two in- 
put data sources is more opportunistic then definitive. A partial differential 
equation approach is used to illustrate the modeling of disparities between data 
sources for a given mapping function. A specific case study of AVDF for 
pixel-level conflation is presented based on Shannon’s concept of mutual en- 
tropy. 

Key words: Imagery conflation, registration, analytical and visual decision framework, 

complexity space, conflation level, rule base, entropy, mutual information. 



1. INTRODUCTION 

It is self-evident that major modern scientific and technological endeav- 
ors; e.g., precision farming, space and resources (oil and gas) exploration. 
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and security and monitoring, use imagery for diagnostics, measurement, fea- 
ture extraction and decision making. 

Scientists, engineers and managers base their decisions on increasingly 
complex and higher dimensional images captured by new instruments and/or 
generated using models and algorithms though vastly different scales. These 
images may be composed from different viewing angles with different 
physical characteristics and environment constraints. Common to these sci- 
entific and application efforts is the challenge of performing information 
assimilation from multiple modality imagery sources to provide sufficient 
evidences so that a decision can be made with incomplete information and 
under operational constraints; e.g. real time practices. 

Analytical and Visual Decision Making (AVDM) framework refers to 
a process using visual environments by and/or for decision makers to acquire 
quality information to support spatial decision making. This framework 
advocates the method of mission specific approach (MSA). Mission specific 
is defined as a generic scope augmented with a specific task; e.g., map mak- 
ing is a generic job whereas making a road map in area X is a mission spe- 
cific task. The integration of information from different maps is in general a 
generic conflation job, whereas the conflation of roads for trafficability 
evaluation using different data sources in a well-defined area will be a mis- 
sion specific task. In other words, a conflation process will have a set of data 
sources with a well-defined time, spatial, and attribute framework within 
which conflicts can be modeled and managed. In general, the state space at- 
tributes outside the conflict extents serve as the reference framework. 

In general, visual decision making is a nonlinear process either purely 
visual or combined with analytic means. For example, conflation can be ei- 
ther a very complex decision making task for trafficability assessment under 
a combat situation (visual) or a simple translation function for a well mapped 
local street from high quality resolution imagery (analytic). Prospecting in 
oil and gas exploration is a typical conflation type decision making process 
using combined visual and analytic means. 

The purpose of conflation, according to the National Technical Alliance 
(NTA) is to create a third dataset that is better than either of the original 
sources by combining information from the two. The report by Swiftsure 
Spatial Systems Inc. [2002] concluded that no one has yet achieved fully 
automated conflation; and vector-to-imagery conflation is required for future 
development of the method. 

Conflation is a process of identification, correction, and synthesis of dis- 
parate information including individual features from multiple imagery (lit- 
eral and non-literal) and/or vector sources. Conflation consists of three types: 
vector-to-vector, imagery-to-imagery and imagery-to-vector or vice versa. 
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conflation and registration 

For vector images, with identified features (e.g., ESRI shape file format) 
conflation means finding features or segments both matched and unmatched. 
For raster imagery, theoretically, two images are conflated if values of all 
corresponding pixels are equivalent, that is the matching ratio R=1.0 in the 
object space, given fully registered and calibrated spatial and spectra re- 
sponse. For conflation between vector and raster images, conflation means 
the establishment of correspondence for features from all given objects. 

AVDM framework for mission specific (task specific) conflation is a 
process of using visuals to reduce the degrees of freedom and/or increase the 
efficiency for identifying the relationships between data sets. Flere the task is 
a triple A=<G, K, D>, where G is a goal, K is domain specific knowledge 
that includes the domain ontology and D is available data. The goal G for a 
well-posed task provides also a criterion for identification that the goal of the 
task A is reached. In formal terms, the goal criterion can be expressed as 
some predicate, Cg, such that if Cg(A)=1 (true) for the task A then the goal 
G has been reached. 

To a certain extent when a well-defined mapping function exists between 
two data sets, registration can be treated as a simplified case of conflation. 
Extensive review of registration methods can be found in [Brown, 1992; Zi- 
tova & Flusser, 2003]. A NSF-funded research-planning workshop on ap- 
proaches to combat terrorism [Moniz & Baldeschwieler, 2002] also listed 
image registration as an open problem: 

... an important area of research is the registration of images from differ- 
ent times and modalities. Registration of such images onto a single coor- 
dinate system is vital for automated analysis but can be extremely chal- 
lenging. The lack of robust image registration algorithms remains a limit- 
ing factor in many fields. 

Registration is closely related to, but differs from conflation. The goal of 
registration is to provide geo-reference for a pair of images without matching 
individual features. Traditionally, registration is a process of seeking the 
mapping function among all the data points using its subset of so-called con- 
trol points. Conflation, on the other hand, includes the matching of features. 
Thus, registration can provide a less specific match than conflation and, in 
essence, conflation is a tougher challenge. 

Source conflict and information change diagnosis become the character- 
istics of conflation process, while seeking the mapping function between two 
images is the key to registration. Typical cases for conflation include the 
matching of a subset of roads among the road networks or across a mapping 
boundary, the mapping of extracted features across different scales or resolu- 
tion, and the combining of information from sources with large spectral 
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variation/change due to viewing geometry, environment variation, or tech- 
nology evolution. Conflation could also come from the need of information 
integration from multiple disparate sources; e.g., reservoir characterization 
using stratigraphy based on well log and seismic cross section. Conflation 
provides a unique application for applying AVDM techniques. 



2. IMAGE INCONSISTENCIES 



2.1 Local and global inconsistencies 

Typically, geometric disparities or inconsistencies (misalignment) may 
be global, local or subpixel in scope. Global disparity, scene/image wide, is 
analogous to similarity and direct linear transformation models in the do- 
main of photogrammetry [Ghosh, 1998]. Local disparity, on the order of 10s 
to 1 00s of pixels, parallels the use of Affine and Rational Polynomial Coef- 
ficient transformation for registration. While all these models have analytical 
solutions, the modeling of subpixel disparities remains an elusive research 
objective. 

Global Inconsistency: Figure 1 illustrates a complex case of global in- 
consistency where multiple information sources exist. Images 1 through n 
are the information sources for the digitized vector products 1 through m. 
There might exist one to one mapping among each pair of the imagery 
and/or product to meet some preset quality measure for registration. How- 
ever, there may not exist a single global mapping function for all the im- 
agery sources and their products. In addition, an overall solution for a map- 
ping function may not satisfy mission specific criteria though a least square 
error measure that fits the requirement. In an AVDM framework, the model- 
ing of this overall global function only serves the function of conflation di- 
agnostics. The purpose of conflation is to increase the precision and accu- 
racy further than what the registration type function can provide for some 
given mission specific requirements. 

Local Inconsistency: Figure 2 illustrates the existence of multiple map- 
ping functions for different objects of interests given a pair of imagery and 
its vector product. The correspondence mapping for Road 172 between the 
map and imagery is relative effortless, either visually or mathematically. The 
same cannot be said about all of the Camp Lejeune and Marines roads. For 
example the road marked straight (Marines) in the vector map appears to be 
curved in the imagery. Because of this obvious difference in visual signature, 
the mapping function for the Marines road will be very much different from 
that of Road 172. The solution will be inadequate based on registration be- 
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conflation and registration 



cause it is an average among all the potential conjugate points when the solu- 
tion for specific set of features requires the partition of feature sets. The lim- 
ited extent of the feature mismatch makes the matching problem "local” in 
nature. In this case, neither the global nor the local solution may be used for 
a mission specific solution, but together they may. Further more, the local 
solution may differ from one road (Marines) to the next (Camp Lejeune), 
thus, the mission specific nature of conflation. 




Figure 1. Inconsistency among imagery sources and their products 




Figure 2. Local inconsistency between imagery source and 
vector product. See also color plates. 
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Human activities and natural occurrences are the two most common 
causes of the ineonsistencies that occur in the imagery and its derived prod- 
ucts. Among all the conflicts, vector-to-vector is the most studied case. Its 
disparities originate frequently from multi-source data over a common ob- 
servation area. 

The disparities come from two sources; i.e., raw data and vector genera- 
tion, due to scale, resolution, compilation standards, operator license, source 
aeeuracy, registration, sensor eharacteristics, currency, temporality, or errors 
[Edwards & Simpson, 2002]. 

The disparities are two types of inconsisteneies; i.e., spatial and attrib- 
utes. Spatial disparities tend to be analytic while attributes disparities are due 
to decision-making from operators based on intensities. Other characteristics 
of inconsistency include discrete in space and time, variable in magnitude 
and direction, and often times incomplete. Below is a case study to illustrate 
disparity structure given a pair of information source and a mapping func- 
tion. 

2.2 Disparity structure analysis and inconsistencies 
decomposition 

The use of geospatial information is becoming more and more 
automated. The trend towards automation demands better data quality and 
confidence measures. In the case of conflation, both data sources carry inac- 
curacies and render the standard error analysis, the dependent-and- 
independent variable analysis methodology, less useful. 

For conflation, a mapping function, continuous or discrete, between the 
input data sources, is necessary and required, but often incomplete. The 
funetion used often is arbitrarily chosen and only approximate. Thus, an un- 
derstanding of error sources in their fundamental form becomes an important 
factor in conflation decision-making. In this section, partial derivatives are 
applied to a basie mapping function, a similarity transformation, to reveal the 
disparity structure in conflation resolution. Disparity structure is used 
here to refer the inconsistency between the data sources in contrast to error 
that is reserved for the differenee between a measurement and the truth. 

The compact form of a typical photogrammetric similarity transfor- 
mation is described in equation (1). Equation (1) has three components: 
translation, rotation and a single scaling factor. (Xt,Yt) are the translations 
along the X and Y-axes, S is the scale factor, and 0 is the rotation angle be- 
tween the two sources. Equation (2) is its expanded form. 
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where: Xj, Yt are the translations, 

Xs , Ys are the coordinates from the standard, 

Xc , Yc are the coordinates from the conflation source, 

S is the single scaling factor, and 0 is the rotation. 

Xj=X^-S*Cose*X^yS*Sine*Y^ 

Yj.=f-S*Sind*X^-S*Cos0*Y^ 

( 2 ) 

For both equations (1) and (2), the assumption is that the similarity trans- 
formation is the known functional relationship between the two data sources, 
with the translation, rotation and scaling as the unknown parameters. Note 
that equations (1) and (2) represent a relatively special case of a general af- 
fine transformation, mathematically speaking. Thus, a more general theory 
would work with an arbitrary affine transform, but make the physical mean- 
ing of the derived parameter set less clear, and defeat the purpose of pursu- 
ing disparity structure understanding. Equation (3) is a set of partial deriva- 
tives for translation, assuming a constant scaling factor. 

= 1 

aF^/ar, = 1 

dXj/dX^=-S*Cos0 (3) 

dYjldX^=-S*Sin6 

dXj/dY^ = S^Sine 

dYjldY^=-S*Cos6 

dX^ 1^6 = 8* Sind* X^+S^Cos6*Y^ 

dYjlde = -S*Cos6*X^ + S* Sind* Y^ 

Equation (4) calculates the total differential along the X direction using 
the derived partial derivatives. This total differential includes nonlinear com- 
ponents for angle measurements. 

dX^=dXJdX*dX^ 

+ax /ax *dx 

ICC 

YdXJdY^*dY^ 

YdXJdd^dd 
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.V 

-S*CosO*dX^ 

+ S*Sin0*dY 

c 

+ (S^Sme*X^ + S^Cos0^YJ*dd 

Equation (5), similarly, calculates the total differential along the Y direc- 
tion using the derived partial derivatives. This total differential is also a 
combination of linear and nonlinear components. The Y components are 
generally symmetrical to the total differential along the X direction. 
dY=dYJdY *dY 

T T 5 s 

+dY ^ /ax * + dY ^ /ar * dY^ + ar^ ide^de (5) 

= \*dY^-S*Sin0*dX^-S*Cos0*dY^ 

+{-S * Cos0 *X^ + S*Sin0*YJ*d0 

In an attempt to relate the total disparity to the original information 
sources, equation (6) calculates the square of the total differential along both 
X and Y directions as the function of measurements. 

Equations (1) through (5) represent the disparity structure that carries 
an orthogonal component assumption with a description of various types of 
disparity terms, their relationship and relative importance, while equation (6) 
describes the combined total under the orthogonal assumption. Some of the 
terms are data source related, and some of them only reveal themselves when 
information sources are combined. 

d^X^ + d% = dX^ + SXos^0d^X + S^Sin0dX + 

S\SiX0X^ + Cos^0Y^)d^0-2Cos0SdX dX 

' C C Y C S 

+2Sin0SdY dX +2Sin0SX d0dX +2Cos0SY d0dX (6) 

c s c s c s Y y 

—2Cos0Sin0S^dYdX —2Cos0Sin0S^ X d0dX 

c c c c 

-2Cos^0SX d0dX +2SiX0S^X d0dY 

c c c c 

+2Cos0Sin0SXd0dY +2Cos0Sin0SX X d^0 

c c c c 

+dY^ +S^Sin0d"X +SXos"0dX + 

s c c 

S\Cos^0X^ + Sin0Y^)d^0-2Sin0SdX dY 

Y C C y C S 

—2Cos0SdY dY —2Cos0SX d0dY +2Sin0SY d0dY 

c s c s c s 

+2Cos0Sin0S^dY dX +2Cos0Sin0S^X d0dX 

c c c c 

-2Sin0SXd0dX +2Cos"0S^X d0dY 

c c c c 

-2Cos0Sin0SXd0dY -2Cos0Sin0SX X d^0 

c c c c 
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conflation and registration 

The simplified version of equation (6) is present in equation (7) for the 
square of the total disparity, where one of the two information sources is 
assigned as the standard (S) information source and the other is the con- 
flicting (C) information source. This assignment is arbitrary because both of 
them have their original source errors that lead to the conflicting informa- 
tion when they are put together into the same framework viewed under a 
selected mapping function. Before this step of disparity analysis, conflicting 
information is simply a concept. 
d^Xj. + dX^ => Total Disparity Budget 

Disparity due to Standard source 
dX^ + dY^ =^Trans. disparity 
Disparity due to Coflicting source 
S^d^X^ + S^dX^ ^Trans. disparity 

{Xl + )d^6 ^Rotation disparity 

2S^d6(X dY —Y dX ) ^Dependent terms 
Disparity due to cross terms of Standard and Coflicting sources 
2S*(Sin0dYdX -CosOdYdY 

c s c s 

- SinedX dY - CosOdX dX 

c s c s 

+ SinOX dOdX - CosOX dOdY 

c s c s 

+ SineYdedY +Cos0Yd0dX ) 

C S C S r 

The total disparity describing the conflicting information in detail using 
sums of squares, the left side of Equation (6), has 30 different terms, the 
right side of Equation (6). Equation (7) presents a choice of grouping of the 
30 terms by disparity types; i.e., terms from sources designated the stan- 
dard and conflicting, and cross terms between the standard and conflicting 
sources. After using partial derivative analysis, and placing the disparities 
into a standard-conflation framework, the right side of the equation (7) in- 
cludes several types of disparities that form a total disparity structure: 

• the translation disparity along the X and Y directions, 

• their combinations with the scaling factor, that is labeled Errors, 
due to a conflicting source in equation (7), 

• the position dependent rotation disparity, and 

• position canceling but rotation dependent disparity. 

In equation (7), the cross term between S and C are all mixed terms either 
position independent or position dependent. The designation for “due to 
standard source” is for terms related to the standard information sources, and 
“due to conflicting source” for conflicting information sources. The cross 
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terms are related to both standard and conflicting information sources. The 
standard and conflicting terms are coded by their corresponding subscripts. 

In essence, the integration of data has some prospect of providing better 
information; but it also presents opportunities to introduce new errors after 
conflation resolution, and potentially larger ones if the mapping function or 
its parameter derivation is deemed incompatible with the data sources. This 
makes AVDF an important component of conflation resolution. Among the 
conflict and cross categories, the assessment of disparity terms that are loca- 
tion dependent is key to visualization and decision-making. 

The coupling and magnification of location dependent and angle- 
measurement error terms arguably will be the most challenging disparity to 
quantify. This necessitates a visual decision making contribution in addition 
to analytic analysis. For example, if the translation error terms from the des- 
ignated standard information source are ignored (as in the standard depend- 
ent-and-independent variable analysis methods), the corresponding dispari- 
ties have to be compensated by the other terms in the equation. This creates 
additional inconsistencies in the subsequent analysis. The illustrated similar- 
ity transformation disparity analysis example shows the fundamental differ- 
ence between conflation and standard registration where one information 
source is considered error free. For more complicated transformation cases, 
direct linear transformations, or more relaxed mathematical models are re- 
quired to map the spatial relationship between information sources. They 
will result in much more complicated disparity terms than using the similar- 
ity transformation equation as the mapping function. The state-of-the-art 
methodology for conflation resolution gets around the complicated and nec- 
essary disparity model by using vector attributes as a linking mechanism 
[Edwards & Simpson, 2002], thus completely bypassing the spatial modeling 
aspects of the process in the first step of the process. AVDM recommends a 
perspective of reducing the dependency of new source disparities on location 
by reducing analytic scopes, using visual cues to aid the conflation process. 



3. AVDM FRAMEWORK AND COMPLEXITIES 
SPACE 

For AVDM, the term registration is defined as the finding one and only 
one mapping function between two geospatial information sources, and con- 
flation resolution as matching features (pixel group) in geospatial informa- 
tion sources with spatial consistency at a segment (subpixle) level and with 
both matched and un-matched features (pixel group) identified; i.e., the spa- 
tial and feature (pixel group) matching are location and attribute dependent. 
Thus, the purpose of registration and conflation is to combine information 
from data sources for the potential of creating a new dataset that is better 
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than the originals. To achieve better- fused data quality, different algorithms 
and methods are permissible under AVDM and its different stages of the 
data and information processing. The facts that no one has yet achieved fully 
automated conflation and that very few researchers are working on the im- 
agery to vector registration, point to the reality of the challenges in the con- 
flation process. 

The AVDM framework provides a multi-level and/or stage solution for 
registration and conflation using multiple observations with requirements 
across multiple scales of geographical extents, and different attributes. One 
of the advantages for this divide-and-conquer strategy, utilizing the potential 
of consistency at one scale, location, extent for a set of attributes to diagnose 
the inconsistency at the other, is to manage the potential complexity at its 
appropriate time or process stage for the benefit of reducing uncertainties. 
The result of AVDF is the improvement of information quality by simplify- 
ing and compartmentalizing the complexities. Figure 3 shows a graphic 
presentation, to the first order, of the potential complexities in registration 
and conflation in the AVDF. Typically, accuracy requirements come from a 
decision-making process for a particular purpose; e.g., a specific change de- 
tection function requiring registration of 1/5 pixel accuracy. The “registra- 
tion requirement” states the minimum number of control pairs necessary for 
a particular algorithm or method to form a solution in order to achieve the 
goal set by the accuracy decision-making process. The “searching require- 
menf ’ refers to the iterations an algorithm has to go through to be sure the 
accuracy requirements have been met. For example, to locate a 60% com- 
mon overlapping area for a footprint of IKxlK imagery with 1 pixel accu- 
racy, a algorithm is projected to go through a 400x400 = 160,000 points of a 
searching grid without rotation. Thus, Figure 3 can also be viewed as a po- 
tential solution space. 



Searching Requirement 
In Iterations 




Figure 3. Complexities space covers the accuracy, registration, and algorithm 
searching requirements for registration and conflation 
The complexity variation can be quite dramatic from the origin to the so- 
lution point in Figure 3, because of the complexity dependency on the under- 




446 



Chapter 18 



lying model and quality of the data sources. Given specific data sets and 
conflation objectives, the complexity space provides a first level framework 
to guide the conflation process. For example, when a particular objective 
calls for a very complex model, huge amounts of data, and very high accu- 
racy requirement, the first level AVDF analysis in the complexity space may 
indicates the impossibility of solution attainment. Subsequently, the com- 
plexity space may provide visual decision making alternatives for a less de- 
manding solution. 

The complexity space provides a framework to view, identify, establish, 
and partition the conflation objectives. The conflation levels in the next sec- 
tion are proposed to provide a process and mechanism to navigate through 
the compartmentalized complexity subspace when a conflation objective is 
partitioned into actionable conflation levels. Disparity analysis provides a 
mathematical framework and quantities for constructing a practical process 
and its associated algorithms to estimate and/or solve the conflation problem 
at a specific level. The goal of AVDM framework is to put the complexity 
space, conflation levels, and disparity structure analysis into proper geospa- 
tial region of interests or scale and boundary. For example, in the case of 
emergency operation, a high decision-making level official needs only to 
know an approximate location for a large scale environment disaster, such as 
a large oil tanker spill, to be able to mange resource allocation, whereas a 
firefighter needs to know the exact location to be able to rescue potential 
victims successfully. In addition, this AVDF provides a mechanism to cate- 
gorize and compare different conflation scenarios; e.g., planning an air strip 
for air trafficability vs. defining a route for tank columns in a hostile envi- 
ronment. Different conflation scenarios have different needs in visual and 
analytical information content integration from multiple information 
sources for decision making. Synergistic integration of analytic and visual 
decision-making provides a mechanism to elucidate conflation scenarios bat- 
ter than using either one individually. 



4. CONFLATION LEVELS 

Conflation level partition subdivides the complex solution space so that 
conflation tasks can be handled recursively through a hierarchical structure 
where each individual level will have well-defined processes, methods, algo- 
rithms and tools. 

First order of division of conflation levels is twofold: Upper and Lower 
Level divisions. The goal of Lower Level conflation is the diagnoses of mu- 
tual information and disparities in the data; mathematically speaking, to pro- 
vide the appropriate variables and sub-datasets for integration. The goal of 
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Upper Level conflation is to provide the guidelines or objectives; mathe- 
matically speaking, the objective functions. The grouped conflation levels 
described above are depicted in Figure 4 where upper level conflation levels 
(decision levels) provide a guideline for designing a formalized objective 
function. Lower levels (measurement/signal levels) can examine data dis- 
parities such as absence of exact match of coordinate values and missing 
features. 



Propagation of 



Objective & 
Funetions 



Propagation of 
decision making 
goals (objective 
functions) to lower 
levels 

Figure 4. Conflation levels: propagation of decisions and disparities 

The Upper Level conflation is further subdivided into Decision-Making 
and Decision-Support Levels. 

The goal for Decision-Making Level is to provide the context type 
match based on input from a decision maker about parameters for the task; 
e.g., errors in the road elevation should not be larger than 5m for supporting 
trafficability. 

The tasks for Decision-Snpport Level typically include change detec- 
tion, feature discovery and target location. On this level data sources are 
matched/conflated to identify objects needed for decision making. 

The Lower Level tasks partition the analytic aspect of conflation into 
Pixel group. Pixel and Subpixel Level matching. 

The Snbpixel Level match is at the signal level; the match of signals at 
this level may come from the same type of physical sources but may be 
separated in time and space. 

The Pixel Level conflation studies the relationship with pixels as a 
whole; e.g., building corners are treated as pixel points disregarding the fact 




discovered I 

disparities to be 
resolved on C 




Lower level conflation 



Subpixel level 



Pixel level 



Pixel group level 



Upper level eonflation 



Deeision support level 



beeision making level 
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that the intensity of a pixel could be a mixture of signals from different 
physical sources. 

The Pixel Group Level conflation investigates the geometrically rela- 
tionship with topological information. At Pixel Group Level, feature proper- 
ties; e.g., roads, and their abstract mathematical quantities become integral 
part of the conflation resolution. 

At times, correspondences exist between the ancillary information for 
data sets that are well calibrated internally, e.g., GPS information on a sensor 
location, and precise viewing geometry for the sensor. For this type of sce- 
nario, the conflation can be performed at the level in-between called Meta- 
data Level match. Below we show examples of different levels. Typical 
cases of the decision level conflation are (1) rescue operation when life is at 
stake and (2) the trafficability from location A to B through the area C via 
ground, air, or water. The trafficability problem has several sub-problems 
that have different requirements for conflation: 

• Military: Advance heavy armor column; 

• Geology: Relocate oil & gas drilling platform; 

• Rescue: Transport rescue teams to earthquake and 

hurricane locations; population evacuation; 

• Engineering: Transport construction equipment to the high 

voltage power line corridor, and 

• Engineering: Construct a highway. 

Examples of decision support level conflation tasks are Change Detection, 
Feature Discovery, and Target Location (see Table 1 for more detail). 



Table 1. 


Decision support level conflation tasks 


Tasks 


Subtasks 


Change De- 
tection 


Military: Damage assessment 

Geology: Plate movement 

Rescue: Disaster assessment 

Agriculture: Crop yield assessment 


Target Loca- 


Military: Identify/illuminate target using GPS/laser, conflated map and 


tion 


imagery. 

Geology: Locate a site for new drilling 

Rescue: Locate the Epicenter and most severe damage location. 

Agriculture: Locate and assess diseased crops. 


Feature Dis- 
covery 


Military: Discovery of a new WMD facility 

Geology: Discovery of a volcanic belt 

Rescue: Discovery of a volcano using aerial photo and orbital imagery 

Geography: Discovery of land use categories (crop, forest etc) 



The example shown in Figure 5 illustrates the use of conflation levels. 
The decision level task here is to move troops from the blue point (see small 
circle) to the red point (see large circle) using the fastest route (using or not 
using roads). To solve this task, imagery and a map should be integrated and 
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their features conflated. This is a decision level conflation problem. In this 
example: 

• the decision support level task is to identify feature types (e.g., 
orchards, rivers, buildings) that help to define the off road route 
(obstacles or camouflage); 

• the pixel group level is to locate the features (e.g., roads, bridges, 
orientation of the orchards). The pixel and subpixel level task is 
to locate geometric attributes (e.g., line segments and points) and 
usability of a specific feature. 




The decision level task: 



Move troops &om the blue point 
to the red point using the fastest 
route (using or not using roads). 



Conflict pertain to objective: Roads have 
curvature, slope, width on the imagery 
different from the map, thus evaluation of 
the shortest pass may provide conflicting 
results. 



Figure J. Decision level conflation problem. See also color plates. 

Each of these levels T, has its own conflict between map and imagery that 
needs to be resolved. Criteria for resolving conflicts in level Li is coming 
from an upper level T,+;. Below in Table 2 we illustrate types of conflict for 
each level. 



5. SCENARIO OF CONFLATION 

Below we discuss a conflation scenario that is based on the concept of 
the Gold Standard. A Gold Standard (GS) is a description of the conflation 
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situation for which there exists a suitable conflation method. Thus, this is a 
description of the situation and the conflation method. It is not necessary that 
the method be very precise as provided by GPS. For instance, for two satel- 
lite images with metadata on sensor model and location, the gold standard 
could be an orthorectification model. We consider both a conceptual 
orthorectification model and a model populated with actual parameter values 
using the description for the data source (sensor model and location). 

The Knowledge Base (KB) contains descriptions of data sources (DDS) 
to be conflated and prior knowledge (PK). Prior knowledge is information 
that can be used to populate a GS for data source type. It differs from data 
that is directly coming from image metadata. 



Table 2. Conflation levels and associated conflict types 



Level 


Conflict type 


Level 6, L6 
Decision level 


Conflict pertained to objective: Roads have curvature, slope, 
width on the imagery different from the map, thus evaluation of 
the shortest path may provide conflicting results. 


Level 5, L5 

Decision support level 


Conflict pertained to feature set: The map may or may not show 
the terrain condition but the imagery will have the information. 
On the other hand satellite imagery does not have feature names 
for navigation but map does. 


Level 4, L4 


Conflict pertained to inaccuracies of data on sensor location and 


Metadata level 


sensor model. 


Level 3, L3 
Pixel group level 


Conflict pertained to individual feature: the map has approxima- 
tion information but the imagery has detailed and up-to-date 
information; e.g., a bridge and roads are under construction. 


Levels 1-2, Li ,L2 
Pixel and Sub-pixel level 


Conflict pertained to geometric attributes of a given feature: the 
width of the roads and bridges and the spacing of an orchard. 



DDS can include metadata if available. Some pairs (DDS, PK) are 
matched with an individual GS by a matching function, M, 

M(DDS, PK) = GS. 

Function Mis not fully defined, for some DDS and PK there may be no Gold 
Standard. One of the reasons could be that DDS and PK are not complete. 
Function M should be computed for every specific DDS and PK to produce 
GS. The conflation scenario based on the Gold Standard consists of four 
steps: 

(1) Identification of Conflation Situation (CS) that includes identifica- 
tion of DDS and PK, 

(2) Identification of Gold Standard (GS) by computing 
M(DDS, PK) = GS. 

(3) Conflation using Gold Standard (This step is abbreviated as CG), 

(4) Assess disparities for feature of interest. 
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A huge variety of possible problems that require conflation motivates the 
introduction of the Gold Standard concept. Several categories of such prob- 
lems from different domains have been described in the previous sections. 

Creating of task-specific conflation methods that we argue for (see chap- 
ter 17, section 2) has many advantages for properly solving conflation prob- 
lems, but it is not easy to implement having in mind a huge number of dif- 
ferent tasks that need conflation. Covering just a few “important” specific 
tasks can be only a temporary solution. The situation is not static, new tasks 
can become “importanf ’. 

The Gold Standard approach tries to solve this problem by creating gold 
standard tasks, where all information of the given type is known and a 
method based on this information is known. For instance, if the meta- 
information about the sensor models and sensor locations is known for two 
panchromatic satellite images, classical photogrammetric methods of 
orthorectification can be applied using standard CIS software such as 
ArcMap. Thus, the Gold Standard approach is a new advance in implement- 
ing the task specific conflation. Creating a set of conflation gold standards is 
more realistic than creating specific conflation methods for every possible 
conflation task. 

In the gold standard approach, the first step is the identification of a con- 
flation situation that is sufficient to be able to identify a gold standard for the 
task at hand at the second step. More specifically conflation steps are: 

• Step 1 : Establish the gold standard for conflating specific input 
sources for the task at hand; 

• Step 2: Populate a gold standard with data from input source features; 

• Step 3 : Conflate input sources using a gold standard; 

• Step 4: Assess disparities for feature of interest in input sources. 

These steps are show in more detail in Table 3. 



Table 3. Conflation scenario steps 



Step 1 


Identification of lower level Conflation Situation (matching with predefined cate- 
gories of conflation situations. 


Step 2 


Identification of gold standard 

Step 2.1: Matching Conflation Situation with one of the predefined Gold Standard 
categories. 

Step 2.2: Populate the gold standard with feature sets from data sources 


Step 3 


Conflation using Gold Standard 

Step 3.1: Conflate input sources using the gold standard 

Step 3.2: Assess disparities for feature of interests in data sources 

Step 3.3: Modify conflation for decreasing disparities 


Step 4 


Accumulation of conflation knowledge base 


Steps are looped repeatedly in the process of refining conflation and re- 
solving conflicts and disparities. Figure 6 shows these steps looped. 
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Figure 6. Conflation loop cycle 

These conflation scenario steps can be implemented in a rule-based 
framework with several categories of rules: (1) rules for conflation situa- 
tions and knowledge updates, (2) rules for references configuration using 
GS, and (3) rules for conflation using GS. The purpose of the first step in 
conflation resolution is to identify the type of conflation that matches the 
current mission specific task at hand. 

A rule can be relatively simple propositional statement or a complex 
model M with clearly defined model use conditions C: 

If <condition C is true> then <use model M>. 

A set of rules forms a virtual expert rule base that is part of the confla- 
tion knowledge base. A specific conflation scenario for the task at hand is 
identified through applying rules that serve as guidelines. 

If the identified scenario matches closely with one in the database of sce- 
narios, this particular conflation task becomes a case of applying an existing 
methodology by following the stored set of rules in succession. Figure 7 il- 
lustrates a rule-based approach employing CS and GS. In more detail, rules 
listed in this figure are described in Section 6. When a closely related rule is 
not in the Virtual Expert knowledge base, it is necessary to work with an 
expert in the domain to build a new scenario; i.e., the Rule of CS-GS2 and 
associated Virtual Expert System updates along with its quality evaluations. 

Building a new conflation scenario is a two-step process. The first of 
which is to establish the GS to be followed by conflation resolution. This GS 
represents a first order solution because a perfect solution is unobtainable 
under the assumption of the existence of conflicts. This first order solution 
can also be viewed as a framework, foundation, boundary condition, etc. . . 

Photogrammetry models are good candidates for the GS because of their 
“rule of thumb” nature and because they require only a few points to satisfy- 
ing the necessary modeling conditions. 

Without the GS, the subsequent conflation resolution runs a risk of solu- 
tion without context. With the establishment of GS, each data point from the 
conflation source pairs has well-defined coordinates that will become the 
starting point for conflation in the subsequent process. 
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Figure 7. Top-level view of Gold Standard based conflation elements and 
their links, see section 6.2 for details for specific rules 



There are three possibilities in establishing the GS with increasing diffi- 
culties: 1) data sources that are rectified, 2) a trusted photogrammetry model 
exists for the data source, and 3) there is very little existing information for 
the data sources. Consulting experts from photogrammetry field can coun- 
teract the increase of degree of difficulties. Upon successful conflation, the 
solution to the new problem naturally becomes part of the system for future 
references. In accordance with definitions introduced above, one of the con- 
flation rnles is formulated as follows: 

If the description of data source (DDS) to be conflated and prior knowl- 
edge (PK) about them are in the Knowledge Base (KB) then compute a 
matching function M that provides a Gold Standard (GS) otherwise apply 
another rule. The second rule tests if both <DDS, PK> and additional in- 
formation about <features> in images are not available and suggests to 
call an expert or to enhance available DDS and PK. 

More exactly two rules are presented below. 

Rule GSl “Computing GS”: 

If pair <DDS,PK> is in KB then compute M(DDS,PK)=GS 
else apply rule CS-GS2. 

Rule GS2 “Expert matching GS”: 

If pair <DDS,PK> and <features> are not in KB 

then call <expert> to match <DDS,PK> with some GS or enhance 

<DDS,PK> to pairs that are in KB. 
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6. RULES FOR VIRTUAL IMAGERY EXPERT 

A virtual imagery expert system is a system that intends to assist a real 
imagery expert in solving imagery analysis problems including conflation 
and registration problems. Below we illustrate such a system by presenting a 
conflation rule base. 

6.1 Rules for identification of conflation situation 

The major source for identifying the conflation situation is the data 
source description available. An upper level identification criteria of confla- 
tion situation is given by a simple predicate Available (<data source>)=Y/N. 

On the next level of detail the same predicate is applied for Metadata and 
Prior Knowledge: 

Available(<Metadata>)=Y/N, Available(<Prior Knowledge>)=Y/N. 

Here Metadata describe a specific data set and Prior Knowledge may be ap- 
plicable to a variety of datasets. 

Next it is assumed that datasets to be conflated are available, that we do 
not need to test predicates Available(<datasets>), but we need to identify 
types of data available: hardcopy, data that describe geospatial features (fea- 
ture-based data), physical sensor parameters. Earth parameters, or data about 
a specific object. 

These data types can be represented by predicates, for instance, it can be 
written as Earth_parameter(<data source>)=Y/N and Hardcopy(<data 
source>)=Y/N. Table 4 and Figure 8 show upper-level rules to enrich data 
sources and conflation situation to be able to conflate data. 



Table 4. Upper level Data Source (DS) rules 



Rule 


IF-part (condition) 


Then-Part (action) 


CSl 


Hardcopy(<data source>)=Y 


Digitize 


CS2 


Features-based(<data source>)=Y 


Obtain features 


CS3 


Physical sensor_parameters(<data source>)=Y 


Apply sensor model 


CS4 


Earth Parameters (<data souree>)=Y 


Apply environment model 


CSS 


Specific object(<data source>)=Y 


Obtain object model 
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Figure 8. Links between conflation situation rules 



Detailed CS rules are more specific and include a variety of parameters, 
such as parameters of physical sensors, parameters of a feature extraction 
mechanism, and feature relations. For instance, some rules can be applicable 
only if edges do not overlap. 

Other rules may require that spatial (lenses or. focal planes) and spectral 
calibration models be known. Further rules may require different derivatives 
(partial, mixed, second and higher orders) for edge detection (Sobel, Canny, 
Laplace methods). Some rules assume that the noise is symmetrical and 
normally distributed. 

The following rule is representative for lower level rules based on pixel- 
group level match (e.g., road pixels). The intent is to diagnose disparity in 
features in lower levels and to resolve the disparity on the decision support 
level by applying criteria known only on this upper level. For instance, for 
trafficability task we may have two roads, R1 and R2. Both roads have dis- 
parity 1 m in two sources. This disparity is discovered in the lower pixel 
group level. At the lower level, it is not known whether any of these roads 
can be accepted for trafficability. On the decision support level, any confla- 
tion resolution may be accepted for trafficability task if the disparity is ac- 
cessed to be less than Im. 

If additional information indicates that road R1 goes to point B and road 
R2 goes to point D; and if on the decision level it is known that the goal is to 
reach point B, then the conflation resolution should be in favor of road Rl. 
Thus, the inconsistencies diagnosis of the conflation situation on the lower 
level is resolved by applying criteria from upper levels. 

The advantages of using AVDM in the stated real world scenario derive 
from the guidance of conflation levels. Conflation level based approach aids 
conflation process by having a mechanism to group types of disparities to be 
matched and resolved with appropriate decision support requirements. Thus, 
AVDM provides a framework for computing and allocating resources for 
only relevant disparities. 
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6.2 Use of gold standard rules 

The focus of this section will be on the links between rules in addition to 
what have provided in Figure 7. In essence, virtual imagery expert can be 
thought of as a closed operating system completed with a set of linked hier- 
archical rules. Assuming that rules are built first, mission specific approach 
(MSP) will make use of the rules to evaluate disparity between two imagery 
data sources. The logic of dealing with disparity is presented in Table 5 with 
two rules: GSl and GS2. 

Rule GSl can be further specified in its then-part. “Evaluate <disparity>” 
can include two steps also encoded as rules 

• Derive <disparity(data source 1 ,data source2)>, 

• Model <disparity>, and 

• Evaluate <disparity>. 



Table 5. Evaluate disparity using Gold Standard 



Rule ID 


Name 


If 


Then 


GSl 


Evaluate 

disparity 


<data source 1> is converted to <GS> 
and 

<Features are in X,Y,Z coordinates> 
and 

<data source2> is converted to <GS> 
and 

<Features are in X,Y,Z coordinates> 


evaluate <disparity> 
and 

set flag=l if disparity is 

low 

and 

{ set flag=0 if disparity is 
high and use Rule CG2 } 


GS2 


Modify 


<disparity> is high 


obtain more information 
from <expert> or 
<other sources> 
for adjusting <GS> 



Figure 9 shows links between rules for building conflation rales based on 
a gold standard and the analysis of prior knowledge. The logic of rule CGI is 
depicted in Figure 10. 




Figure 9. Links between rules for building gold standard 
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Figure 10. Links between conflation rules based on gold standard 



6.3 Rules for GS identification 

It is natural to require all information sources be addressable by the 
<x,y,z coordinates> -spatial part of the GS, albeit these can be relative. Thus, 
a key attribute to the rales for the virtual expert system is to built <function 
F> that will provide geo-reference coordinates <x,y,z>. 

The theoretic concept of Gold Standard at rales level is to identify 
specific and appropriate means that provide geo-referencing such as GPS to 
facilitate <function F> building for computing geo-references. Informally 
the concept of the gold standard is coming from the idea that GPS provides a 
<gold_standard> for geo-location, e.g., for targeting. 

Tables 6 and 7 contain rales that are designed to work in this 
environment. Table 6 deals with prior knowledge that can lead to an 
appropriate <Function F>. For instance, <prior knowledge> in rale GS5 
could be that image 1 has <a crater> then a crater (bomb, or gas explosion) 
expert could be called, whose knowledge can help to match crater in one 
image with original unaltered information from the other to provide 
corresponding location between the two. 



Table 6. Rules for getting rules and models 



Rule 


If 


then 


else 


GS3 


There is <GS> for 
<data source> with 
defined <x,y,z> 
coordinates> and 
<geometric primitives> 


convert <data source> 

to <GS> 

and 

apply Rule CGI 


apply Rule GS4 


GS4 


There are 


set flag=l and define 


use Rule GC6 




<x,y,z eoordinates> and 


<x,y,z eoordinates> 


set flag=0 and call <ex- 




<geometric primitives> 


and 


perts> for <rules> on 




suitable for <data souree> 


<geometric primitives> 
for <data source> 


<prior knowledge> or 
<conclusion> 


GS5 


Rule GS4 fails 


set flag=0 and call 
<experts> for <rules> on 
<prior knowledge> or 
<conclusion> 
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In Table 7, <Ph_model> notation stands for a Photogrammetry Model 
that includes orbit and sensor information. In this table, Rule GS 12 is de- 
scribed by using the concept of modality that is denoted as <QQQ>. Modal- 
ity <QQQ> could stand for <LIDAR>, <Vector Image> and others. 



Table 7. Rules for coordinates <x.y,z> and photogrammetric models 



Rule 


If 


then 


else 


GS6 


<prior knowledge> is a 


use <Ph model> as 


construct <Ph Model> by 




trusted <Ph model> 


<prior knowledge> 
in Rule GS2. 


using Rule GS7. 


GS7 


<data source> is 
suitable to use 
<Ph model> 


select <Ph model> 
and calculate 
<Ph_parameter> and con- 
vert <Ph model info> 
to <Ph model> for using 
by Rule GS2. 




GS8 


<1:1 function F> from 


attempt to identify func- 


attempt to create a new 




<data source> to a 
<x,y,z> reference coordi- 
nate exists but may not be 
known 


tion F as <GS> 


<flinction F> as <GS> 


GS9 


<function F> has 


Attempt to identify 


request a <function F> 




<new argument> 


<function F> with this 
new argument 


from <expert> or build a 
local function using Rule 
GS13 


GSIO 


<Ph Model> is 


select 


If <Ph Model> is 




<plane to_plane> or 
<3D to 2D> 


<projective_model> 


<3D to 3D> then select 
<conformal model> else 
use standard coordinates 


GSll 


<image rectified> 


identify <scale> and use 
Rule CGI 


convert image to <GS> 
using <aspect ratio> or 
<measuring units> or 
<variation param> 
and apply Rule CGI 


GS12 


<modality type> is 
<QQQ> and <QQQ pa- 
rameter set> is matched 
completely with 
<GS parameter set> 


Setup flag 1 


Setup flag 0 and 
extract <GS parameters> 
set from <data source>, 
or use <analytic model> 
or make <assumption> 
on <GS> parameters 


GS13 


<Ph model info> of 


convert <data source> 


If <z values> are <miss- 




<data source> of 
modality <QQQ> is 
complete and applied for 
the whole image 


to <x,y,z> 


ing> then make <assump- 
tions> 



Table 8 presents rules when geo-referencing <function F> does not apply 
to the whole image. It deals with areas of interest (AOI) and with areas of 
conflation (AOC). Rule GS 1 8 deals with their specific type -<flood type>. 
Other types could be such as <war_type> or <fire_type>. 
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Table 8. Area of interest rules 



Rule 


If 


then 


GSM 


<function F> does not apply for the 
whole area 


define 

<Area of Conflation> 


GSMa 


Too many <control points> or 
<lack of good strategy> for defining 
<AOC> 


Attempt to select <sample strategy> 
if fail call an expert to create <strategy> 
for defining <AOC> 


GS17 


<Change occurred> from 
<image 1 to image 2> 


mask out <changed area> mask in 
<useftil features> from <mask out> and 
if <expert can help> encode <experf s 
input> to <AOC> 


GS18 


<Flood type> 


get <DEM> mask out <lower eleva- 
tion> mask in <tall feature> 



7. CASE STUDY: PIXEL-LEVEL CONFLATION 
BASED ON MUTUAL INFORMATION 

In this section we describe a pixel-level case study of the multilevel con- 
flation approach. Feature-based decision support level cases are presented in 
other chapters. Signal quantization to pixels is one of the error sources that 
conflation of images should deal with. The quality of a digitally recorded 
image signal depends on quantization characteristics such the number and 
size of discrete units (pixels) per image and the number of bits per pixel. 

A digital number (DN) is the number identified in the process of an 
analog signal quantization. The digital number is characterized by the num- 
ber of bits n allocated to encode the signal value. A resolution of analo- 
g/discrete (A/D) conversation is identified by n. 

If n bits are used per pixel, then 2" intensity values can be encoded in the 
image, thus an 8 bit image is limited to 2*=256 intensity values. For larger n 
such as 16 or 32 intensity values can be 2'® or 2^^, respectively. 

Each individual wave band can be displayed by one intensity parameter 
using n-bit digital number. Any three bands can be combined together using 
a composite of three “false” colors by assigning red color to the first band, 
green to the second band and green to the third band. This requires 3«-bit 
DN to encode contribution of all colors. A standard false color composite of 
red, green, and blue is used to encode near-infrared, red, and green colors, 
respectively, in vegetation analysis. 
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7.1 Mutual Information: theory and applications to 
conflation 



7.1.1 Entropy 

The concepts of Mutual Information stem from information theory. 
Shannon’s Information Theory deals with the fact that the “actual message is 
one selected from a set of possible messages”. The fundamental concept in 
Shannon’s information theory is that information is conveyed by random- 
ness. Information from Shannon’s theory is defined as a measure for the 
statistical dependency of messages selected from a set of possibilities. The 
focus of mutual information, however, is on “two messages”. While infor- 
mation theory has been successful in serving the needs of communication, its 
application in imagery processing and pattern recognition has been limited 
because of the difficult challenge in formulating the required probability 
density function. 

Conflation provides a unique imaging application for mutual information 
(MI) when it is considered as a subject of studying some finite combinations 
of parameters in a given model from observations separated in space, time, 
and wavelength. The challenge of applying MI to conflation types of appli- 
cations in imagery and its derived information sources lies in the area of 
quantifying and evaluating information beyond mean square error criterion. 
This case study exploits MI for image conflation. 

The basis of information theory is entropy, H, that characterizes informa- 
tion or its rate production [Shannon, 1948; Shannon & Weaver, 1963]. 

Suppose a random variable W={x,} with values x, (/=!,...,«) has prob- 
abilities ofx; occurrences pi=p(xi), p 2 =p(x 2 ),...,p„=p(x„). The entropy meas- 
ure H is defined as 

H(X)=-KUv.nPi\gPi ( 8 ) 

where W is a positive constant amounts to a unit measure. H is continuous in 
the Pi and H(X) < Ig n. If all are equal (pr^/n) then H should be a mono- 

tonic increasing function of n, H(X) = K\gn. 

7.1.2 Mutual Information (MI) 

Suppose there are two random variables, X and T, in question, with n 
possibilities x, for the first and m possibilities y, for the second. Let 

Pij=p(xi,yj) 
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be the probability of the joint occurrence of x,- for the first andy, for the sec- 
ond. The joint entropy of two variables X and Y is 

H{X,Y)^-YpAgp^^ 

H{X,Y)<H{X)+H{Y) 

with equality only if variables are independent, i.e., pu = ppj. 

Mutual Information MI(X,Y) is the relative entropy between the joint dis- 
tribution and the product distribution: 

M{X,Y)=j:pAg-^. (9) 

m 

Note that M1(XX)=0 when X and Y are independent variables, because in 
this case py/ppj =1 and logl=0. Mutual Information M1(X,Y) is related to the 
rate of transmission R from Shannon, i.e., the amount of addition informa- 
tion that must be supplied to correct the error 

R=fe„,-forr = H(X)+H(Y)-H(XY)=MI(XY). (10) 

The state-of-the-art registration and conflation processes, methods and algo- 
rithms have synergistic needs in: defining overlapping areas, mitigating 
multiple solutions in parameter space, refining local solutions, and reducing 
computation complexities. 

This chapter presents two basic aspects of Ml using case study approach. 
First, we illustrate the potential use of MI for defining the overlapping area 
by reducing computation complexities compared with cross correlation 
methods. Second, we illustrate the concept of using Ml to find conjugate 
registration points when cross correlation fails, thus demonstrating its poten- 
tial for mitigating multiple solutions in parameter space and refining local 
solutions. 



7.1.3 Histogram and Entropy 

The first step in MI application is the calculation of pixel radiance 
intensity histogram and entropy. A window of data is selected Ifom 
the input imagery to calculate the histogram before it is converted to 
entropy. Figure 1 1 is a Landsat imagery over Idaho of the size of IK x 
IK pixels; it represents a typical agriculture area in a Northwest semi- 
arid region. The circular features nested in a regular grid in the im- 
agery indicate a nominal irrigation pattern. 

Generally for the Landsat class of imagery, 256 probability distribution 
bins exist because of its 8 bits digitizing precision. Figure 12 shows a set of 
histograms formed by dividing the entire imagery into equal quadrangles. 
The histogram in the center of Figure 12 represents the whole image. 
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Figure 11: A IKxlK frame of Landsat imagery from an agriculture area in Idaho. Circular 
features nested in a regular grid in the imagery indicate a nominal irrigation pattern. 




Figure 12. A set of histograms formed by dividing the imagery into the corresponding equal 
quadrangles. The subplot in the center of the figure shows the histogram of the whole image. 





18. Multilevel analytical and visual decision framework for imagery 463 

conflation and registration 

The comparison of these histograms indicates that they are location and 
size dependent (see also Figure 14). The location dependency forms the ba- 
sis for spatial information correlation. The size dependency becomes a factor 
in the efficacy of applying specific methods. 

To reduce the dependency of the histogram of accumulative counts on 
window size, the histogram values are divided by the total of number of 
points in the selected window, providing a probability density function p and 
enabling the calculation of entropy. The transformation from probability 
density distribution p to entropy E reduces all the information from the se- 
lected window of data to only a single number, i.e., the number of bits over 
the area of interests. 

This data compression results in quantifying the information from the se- 
lected window as an average number of bits needed to convey radiance in- 
tensity information for all the image pixels. This averaging nature is inher- 
ited from the normalization process of converting the frequency counts to 
probability in addition to the application of unit measure provided by the 
base value of logarithmic operation in the entropy formula. Figures 12 and 
1 3 illustrate the conversion of histogram to a probability density function. 
The magnitude of vertical axis in Figure 12 is on the order of lO"^, while the 
magnitude of the vertical axis in Figure 12 is normalized to around 1/10 of 
the total number of points in the selected window. The horizontal axis repre- 
sents pixel intensity values ranges from 0 to 100 out of 256 bins. Pi in Figure 
13 indicates the probability for a given intensity digital number bin. 




Figure 13. Illustration of the conversion of histogram to probability. The entropy value calcu- 
lated using Eq. 8 is of 4.7576 bits. It is within the 8 bits of Landsat accuracy as expected. 
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7.1.4 Use of mutual information in image registration and 
eontlation 

Formulation of mutual information has been evolving since Shannon 
formulated the information theory in 1946. Mae et al. [Maes, Colignon, 
Vandermeulen, Marchal & Suetens, 1997] and Wells et al. [Wells, Viola, 
Atsumi, Nakajima, & Kikinis, 1996] have successfully introduced it to the 
medical fields. Studholme [Studholme, 1999] introduced normalization into 
Ml to further normalizing the information content through a ratio of summa- 
tion of the marginal probability over the joint probability. These basic re- 
search developments signify the continued refinement of using MI for image 
registration. Such evolution comes from the fact that MI is a result of a set of 
nonlinear transformations that reduce the first order dependencies of inten- 
sity on location and window size. 




M{Ii,l2) =//(/;) +//(/i-) -IIQ1J2) 



= 4.4067 

Figure 14. Entropy (HI, H2) and mutual information (M {I 1 J 2 )) calculated based on the quad- 
rangles 2 and 4 from Figure 13. Vertical axis on the left marks the measure for the individual 
DN bin while the vertical axis on the right marks the mutual information in terms of number 
of bits. 

Registration and conflation involves the relationships between conjugate 
pairs of the information sources. Figure 14 illustrates this principle by using 
two windows, HI and H2, of data selected from Figure 12; i.e., top left and 
bottom right quadrangles. 

As the word “mutual” implies, the focus is on depicting the information 
common to both of the input entities. Vertical cyan (will color be used?) 
lines in Figure 14 mark the intersection of HI and H2. The entropies for the 
selected quadrangles and their joint histogram are 4.6066, 4.6124 and 
4.8123, respectively. 
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7.2 Computation reduction for overlapping area 



First order registration and/or conflation are of great interest to authors of 
any alignment type algorithms for the reasons of computation reduction. The 
interests and advantages start from the reduction in the necessary searching 
space. Additionally, when large numbers of pixels are involved, the conver- 
gence is not guaranteed for many registration algorithms due to the potential 
for increased error and large intensity variation, both of which contribute to 
the reduced correlation. For Ml to succeed in registration and conflation, it is 
necessary to demonstrate its power in computation reduction and stability 
across large radiance variability. The first example in this section demon- 
strates the use of Ml for computation reduction while the stability will be 
demonstrated in the next example. Figure 15, using the same image as in 
Figure 12, demonstrates a situation of finding one registration tie point for 
global translation in both horizontal and vertical directions. 




Figure 15. Defining imagery overlapping area using the maximum MI criteria. Top left is the 
input imagery. Top right is the image chip for the center of the original imagery. Lower left 
is the results from MI maximization. Lower right pictures the absolute entropy difference 
between a given window from the original imagery and the imagery chip. 






466 



Chapter 1 8 



To achieve a desired accuracy 1 pixel for a predefined 60% coverage of an 
image chip without rotation, it is necessary to have the image chip overlay- 
ing the original IK by IK imagery 400x400 = 160,000 times before a satis- 
faction solution is derived. If rotation is needed, it will take much more 
computation. The computation time used in finding the solution of a 
(200,200) offset with the window size of 624 by 624 is about 320 units of 
CPU time measured by Matlab tic and toe utility on a desktop machine. 

For comparison, the calculation of a single iteration of a correlation of 
the center image chip with itself used up more than 0.3 epu time units on the 
same computation setup as in MI calculation. This 0.3 CPU time duration 
translates to more than 42,000 units of total epu time for searching the entire 
160,000 points on the iteration grid. Thus Ml shows about 100 times im- 
provement in computation efficiency when compared with correlation based 
method. 

7.3 MI for Control Point Selections 

Registration requires control (conjugate) points. At the top level, there 
are three different ways of coming up with control points: manual, autocor- 
relation based, and mathematic fitting using equations. Autocorrelation is 
acknowledged to be the current standard used by many state-of-the-art 
software packages. Normally, in registration, cross correlation of radiance 
intensities from two or more images uses window sizes on the order of a 
dozen pixels. When compared with the size of normal Landsat or Spot im- 
agery on the order of 6K, this local registration tie point from a dozen pixels 
is of different order of magnitude from imaging warping. The need to de- 
velop and test algorithms on the order of 1 00s pixels will be obviously bene- 
ficial to warping type registration/confiation tasks at the scale of 100s of 
pixel misalignment. 

At the scale of 100s of pixels, inevitably, misalignment will contain a 
component of rotation. Correlation is not known to be stable when rotation 
and scaling exist in the pair of data set matrix, particularly when there is 
intensity variation. The example illustrated in Figure 16 demonstrates the 
utility of using entropy instead of radiance intensity for finding registration 
tie points using correlation, thus demonstrating the advantages of using en- 
tropy and mutual information for registration and conflation. 

The image pair in Figure 16 has obvious translation, rotation, and scaling 
disparities. There are many readily mapable features in this imagery pair. 
Among them, a river bend and a lake are mapped using a dark blue vector. 
When the dark blue vector from the river bend is translated to the location of 
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the white vector mapping the lake with the sharing a common original, a dif- 
ference vector results marking the misalignment pointed by a curved arrow. 

The curved arrow in Figure 16 points to the vector representation of mis- 
alignment with components of offset, scaling and rotation. To show the 
benefit of Ml, this pair of radiance intensity images is first transformed to 
entropy using a window size on the order of 60 pixels. The reduction in high 
frequency signal from the entropy transformation demonstrates Ml’s 
capability of reducing local variability. Figure 16 also reveals that the 
transformation of the radiance intensity to entropy is a non-literal one; e.g., 
both the river (dark pixels) and agriculture (bright pixel) can have medium to 
high entropy. The comparison will be demonstrated using correlation of 
data windows with size on the order of 600 pixels. 




Intensity 

b'orrelatic 



' (Correlation ' 

window size S90 






E 



Mi window size 
60 



Image (a) Image (b) 

Figure 16. Illustration of offset, scaling, and rotation phenomenon at the scale of warping; 
i.e., 100s pixels. Image (a) and (b) or the time lapsed image pair. Blue vector maps corre- 
sponding river bend features. White vector maps the lake. The dark vector maps the registra- 
tion vector for the selected features. The lower two figures are the displays of entropy trans- 
formed using equation 8 on the corresponding intensity imagery pair. See also color plates. 



Figure 17 illustrates the correlation of both radiance intensity and its en- 
tropy images with 35-pixel lag in both horizontal and vertical directions. Ra- 
diance intensity-based correlation is to the left side of the figure and entropy 
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to the right. The correlation window is centered near the locus of the image. 
The comparison of these two correlations indicates the entropy-based data 
has number about 0. 1 higher than that of intensity based. Further more, en- 
tropy shows only one peak while the correlation appears to have a sharper 
peak but with multiple potential solutions. Notice the improvement of corre- 
lation coefficient of entropy over intensity, and the existence of a unique 
peak for entropy 




(a) Intensity based cross correlation (b) Entropy based cross correlation 



Figure 1 7. Comparison of the cross correlation between intensity and entropy with the rest of 
parameters staying constant 

To further demonstrate the benefit of using entropy and MI, this process 
is repeated over a 7 by 7 regular grid centered across the imagery pair. The 
results are plotted in Figures 18 and 19 with radiance intensity -based correla- 
tion displayed in Figure 18 and entropy-based results in Figure 19. There are 
total 49 sub plots corresponding to their locations within the image in each 
of the correlation figures. The peaks, or the maximum correlation, suppos- 
edly are the solutions for control registration points. 

The advantages are at least twofold: 1) the improvement on computa- 
tional complexity, and 2) the consistent and unique solution for mapping 
control pairs across 100s of pixels. These advantages could potentially pro- 
vide solutions to map overlapping areas and provide the necessary inputs for 
higher order registration beyond standard physical photogrammetry models. 

One of the potential key contributions from the MI based approach is its 
ability to match the required solution for a pre-selected model or method at 
the similar spatial resolution and extent requirements. The ability of MI to 
transform imagery and its derived product to the quantity of information 
domain may provide the necessary mathematic foundation for conflation. 
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Figure 18. Intensity correlation non-unique and wrong solution. Radiance intensity-based 
correlation from total of 49 locations centered on a 7x 7 grid over the image pairs in Figure 
16. The connection of the correlation peaks does not have the same trend as the registration 
vector displayed in Figure 16. See also color plates. 




Figure 19. Entropy correlation. The trend matches with data. Entropy-based correlation with 
the same lay out in 18. The connection of the of the correlation peaks does have the same 
trend as the registration vector from Figure 16. See also color plates. 
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8. CONCLUSION 

Scientific, economic, and national security interests gain benefits from 
decision makings utilizing multiple information sources. These information 
sources are increasingly becoming higher in dimension. Imagery and the 
information extracted from them become increasingly more complex to inte- 
grate under time and space operation constraints. The state-of-the-art meth- 
ods and algorithms turn out to be less adequate to meet the needs of informa- 
tion analysis and integration. This chapter is intended to be the beginning of 
a required systematic approach to address the complexities in the integration 
of information derived from imagery. The three important aspects in 
addressing the information integration include (1) decision making 
framework and data, (2) information quality and/or error analysis, and (3) 
methods, rules and algorithms. This chapter defined imagery and geospatial 
data conflation and its levels accompanied by the examples. 

Figure 3 illustrates a framework incorporating potential contributions 
from the integration of visual and analytic methods. This framework is 
adaptable to specific scenarios of similar cases as well as individual tasks of 
a given application case. It provides a starting point for defining conflation 
scenarios, cases, tasks, and their refinements to meet mission specific re- 
quirements. It is equally applicable to algorithmic selection and computa- 
tional analysis. 

Despite the potential complexities of inconsistencies in any conflation 
scenario, the geometric discrepancies are mathematically quantifiable for a 
given task and application case. The error analysis indicates that the key to 
increasing the quality of reducing spatial disparities lies in the decoupling of 
the dependencies of the rotation component from its location dependency. 
Correlation has been the standard in spatial information integration, though it 
has been known to fail in the cases where either information is overwhelm- 
ing or inconsistent. Mutual information has the potential to overcome some 
of the shortcomings of the use of correlation in conflation. Further work in 
Ml is vital to understand the advantages and disadvantages in the area of 
information integration using MI algorithm at multiple scales and across dif- 
ferent extents. It is anticipated that large amounts of knowledge will be ac- 
cumulated in each of the three areas discussed here as research activities 
proceed, and an expert system with the ability to capture and analyze the 
performance of a framework and its associated methods and algorithms will 
be a great catalyst for information integration using imagery. 

The scenario of conflation was described along with linked concepts of 
conflation situation and a gold standard for conflation. The conflation sce- 
nario is based on sets of rules: (1) rules for linking conflation situation to a 
gold standard, (2) rules for identification of conflation situation; (3) rules for 
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identification of the gold standard; (4) conflation rules based on the gold 
standard. Future work includes developing and implementing more rules and 
detailed quality assurance tools. 
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10. EXERCISES AND PROBLEMS 

1. Build 3-D error equations by generalizing the 2-D model presented in 
equations (l)-(7). 

2. Decompose your 3-D errors from exercise 1 in a way similar to what was 
done in equation (7) for 2-D. 

3. Build a 2-D error equation for an affine transform: 

x’ = miix + m^y + m^ 
y’ = m 2 ix + m 22 y + m23 

Tip: use differential approach taken in equations (l)-(6). 

4. Decompose your 3-D errors from exercise 3 in a way similar to what was 
done in equation (7) for 2-D. 

5. Build new conflation rules similar to those presented in section 6. 
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CONFLATION OF IMAGES WITH ALGEBRAIC 
STRUCTURES 
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Abstract: Spatial decision making and analysis heavily depend on quality of image regis- 

tration and conflation. An approach to conflation/registration of images that 
does not depend on identifying common points is being developed. It uses the 
method of algebraic invariants to provide a common set of coordinates to im- 
ages using chains of line segments formally described as polylines. It is shown 
the invariant algebraic properties of the polylines provide sufficient informa- 
tion to automate conflation. When there are discrepancies between the image 
data sets, robust measures of the possibility and quality of match (measures of 
correctness) are necessary. Such measures are offered based on image struc- 
tural characteristics. These measures may also be used to mitigate the effects 
of sensor and observational artifacts. This new approach grew from a careful 
review of conflating processes based on computational topology and geometry. 
This chapter describes the theory of algebraic invariants, a confla- 
tion/registration method with measures of correctness of feature matching. 

Key words: data fusion, imagery conflation, algebraic invariants, geospatial feature, poly- 

line match, measure of correctness, structural similarity, structural interpola- 
tion 



1. INTRODUCTION 

Algebraic invariants form a new methodology that automates the 
combining and correlation/registration of images from many sources with 
various resolutions and reliability, giving them common scales and 
coordinates (for analysis of different approaches see Chapters 17 and 18). 
Algebraic invariants use techniques that are based on matching linear 
features described by mathematically constrained line segments (polylines). 
This method matches polylines using their robust structural characteristics 
instead of more traditional matches based on less robust geometric distances. 
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traditional matches based on less robust geometric distances. The new 
method permits high speeds and automation since matches are done using 
only a small fraction of the total image data. Structural characteristics are 
measured separately for selected features (this can be done off-line) instead 
of time intensive feature pair comparison as required by other methods. This 
approach can also be used to automate the identification of locations that 
change in time, and be used to automatically search for specific objects of 
interest by predefining their abstracted linear shapes. 

The registration and conflation method based on algebraic invariants us- 
ing polylines may be done in several ways. Consider the two satellite images 
of the Kyrgyz lake Sonkyl in Figure 1. 




Figure 1. Two images of the lake Sonkyl in Kyrgyzstan 

Feature extraction programs can be used to construct numerous polylines, 
with the most obvious one being the shoreline of the lake. The results are 
shown in Figures 2 and 3. 

While the overall structure of the extracted shorelines is apparent, the 
polylines differ in detail for a variety of reasons. Robust ways of comparing 
these polylines are necessary to determine image transformations and to as- 
sess the quality of the result. 




Figure 2. Extracted Sonkyl features 

The angles between segments and individual segment lengths are two al- 
gebraic characteristics of polylines that can be used. For smooth features 
extracted from images with comparable scales and resolutions, either com- 
parison works well. When there are marked differences in image scale and 
resolution, the choice of angles or lengths becomes more important. This 
chapter examines characteristics of extracted polylines and how they may be 
interpolated and compared and used to conflate images. 
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Figure 3. Lake shores extracted from the photographs of Sonkyl 



2. ALGEBRAIC INVARIANTS 

We considered both topological and geometrical models before deciding 
on an algebraic approach as a major tool for registering images. Features 
within different images have different properties such as the number of seg- 
ments and different beginning and ending points. These variations in the 
same feature make the use of algebraic invariants very attractive, particularly 
when compared to the challenges that they present for other methods. (Our 
comparison of methods is summarized in Chapter 1 .) 

The algebraic invariant approach to conflating images makes the follow- 
ing assumptions; 

(1) The images (satellite images, gravity maps, aerial photos, digital ele- 
vation maps, synthetic aperture radar (SAR), etc.) have no common refer- 
ence points established in advance for matching them. 

(2) The images have different (and often) unknown scales, rotations 
and accuracy. 

(3) Each image has several well-defined “features” that can be repre- 
sented as polylines (continuous chains of line segments). A feature is a wider 
concept than is commonly used in image sciences. Anything with a rea- 
sonably well-defined shape will work as a feature. A closed polygon is 
also a feature in this sense. The only requirement is that the feature can be fit 
with a polyline. It is not necessary to know what it is or if there are any cor- 
respondences with polylines in other images. 

(4) These well-defined features can be relatively easily extracted. 

The example considered above illustrates the method of the use of poly- 
lines as a base for applying algebraic invariants to image confia- 
tion/registration. 
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We now describe the major concepts related to algebraic invariants that 
are the basis of our techniques, beginning with definitions of terms. 

2.1 Algebraic definitions 



Definition. A pair a = (A, Q) is called an algebraic system [Mal’cev, 
1973] if A is a set of elements, Q a is a set of predicates {P} and operators 
{F} on A and on its Cartesian products, where P: AxAx...xA ^ [0, 1] and 
F: AxAx...xA ^ A. 

Definition. A triple a = (A, R, Q) is called an multisort algebraic sys- 
tem if A and R are sets of elements, Qa is a set of predicates {P} and opera- 
tors {F} on A and R and on their Cartesian products, where P: BixB 2 X...xBn 
^ [0, 1] and F: BixB 2 X...xBn ^ Bn+i, where each Bi is A or R. 

A set of axioms can be associated with an algebraic system to generate a 
specific system such as a group, a field or a linear feature. 

Definition. An algebraic system a = (A, R, Oa ) is called a linear feature 
if R is a set of real numbers, Oa consists of two operators (functions) D(aj) 
and L(ai , aj) and three predicates (linear order relations) >„ , >d , Thus 
Qa = {D( ), L(, ); >^,>d,>l) where: 

1 . V a, , Qj G A: Ui >„ a,- or aj >a at (All elements of A are totally 
ordered.) 

2. D: A [0, oo) (An element a is called a linear interval and D(a) is 
the length of a.) 

3. L: AxA [0, 360] ( L(ai, aj) is called an angle between a, and Qj .) 

4. Ui >o Uj D(at) > D(aj). (This links >d with D( ) ). We call ele- 
ments Qi , Qj linear intervals and say that element a,- is no shorter than 
element m if a,- >b a < , that is, 

D(a,)> D(aj ) ). 

5. V Qi , aj , a* , G A: (d,- , aj) >l (au , a„) L{ai , aj) > L(au , a^ 
(This links >l with L(ai , aj) ). 

Properties: 

• V a,- , a,- G A: a, >o aj or aj >n at . 

• V a,' , aj , ak , e A\ (at , aj >l (ak , am) or (ak , am) (d; , aj. 
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2.2 Co-reference: definitions and a theorem 

Definition. An algebraic system a is called an abstracted linear feature 
of feature a if Qs consists of three predicates (linear order relations) >a , , 

>L , with Qa = {>a , , ^l) from the linear feature a. 

Definition. An algebraic system ^ = ( is, Qe ) is a subsystem of an alge- 
braic system a = {A,Q.^) \iE and Qe = ■ The subsystem relation is 

denoted by c a . 

Definition. An algebraic system e = ( 7 i, Qe ) is a shared subsystem of 
algebraic systems 

a = ( ^, Oa) and 6 = ( ^, Oa) \iE^A,E^B and Qe=t^a=t^b- 

Definition. Algebraic systems a = (A, } and b = {B, Q.^) are co- 

referenced if they have a shared subsystem e = {E, 

This property is not easy to test because it requires matching equal ele- 
ments of ^ and B in advance, which is a major goal of conflation. 

Definition. Algebraic systems = ( E„, R, Oe ) and e/, = {E/,, R, Q.^) are 
isomorphic if there is a one-to-one mapping g: Ea Et such that for every 
predicate P and operator F from Qe 

Vei,e2,...,e„\P(ei,e2,...,e„) = P(g(ei),g(e2),...,g(e^) and 
fi(F(ei,e2,...,e„)) = F(g(ei),g(e2),...,g(er)). 

Definition. Linear features a = {A,Q.^) and b = { B, Q.\,) are co-reference 
candidates (CRC) if they are homeomorphic and have isomorphic linear 
subfeatures e„ = ( Ea, R, Qe ) and et = {Eb, R, Qe ), where c a and e* c 6 . 

Definition. The number of elements n = \Ea\ = \Eb\ in isomorphic linear 
subfeatures e„ = (iia, Qe) and e* =(£■*, Qe) such that a and ejcA is 
called an index of co-reference. 

Definition. Subsytem e is called a maximum co-reference if e is the 
largest co-reference subsystem in a and b. 

Theorem 1. If the number of elements in linear features a and b equals n, 
then their maximum co-reference subsystem e can be found in O(n^) com- 
parisons of matrixes for the worst-case scenario. 
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Proof. To prove this theorem we note that the task is equivalent to find- 
ing the largest common submatrix such as shown in Tables 1 and 2 below. 
This submatrix should be centered on the diagonal of the two matrixes for a 
and b as shown in Table 2. The total number of such matrixes is n+(n- 
l)+(n-2)+ ...+2+l=(n+l)n/2. To find the largest common submatrix we 
need to compare submatrixes of the same size ix i in both matrixes. 



Table 1. Illustrative matrix for feature a 





Ll 


L2 


L3 


L4 


L5 


L6 


Ll 


1 


0 


1 


1 


0 


1 


L2 




1 


0 


1 


0 


0 


L3 






1 


0 


1 


0 


L4 








1 


1 


0 


L5 










1 


1 


L6 












1 



Table 2. Illustrative matrix feature b 




There are n smallest 1x1 submatrixes that contain a single binary num- 
ber. These submatrices are actually individual diagonal elements. Each 
submatrix (element) from Aa should be compared with n submatrixes from 
Ab, that is matrix comparisons. Because every lx 1 submatrix contains 
only one element, there are the same binary number comparisons. 

There are n-\ matrixes of the next size 2x2, each such matrix contains 
4 elements (only three of them really need to be compared because the ma- 
trix in antisymmetric). Each of (n-1) matrixes in Aa needs to be compared 
with all (n-1) submatrixes in Ab. This requires («-l)^ matrixes, and 2{n-Vf 
binary number comparisons. 

Similarly for every matrix of size (n-k)x (n-k), there are (n-kf matrix 
comparisons. 

The total number of matrix comparisons is O(n^) because: 
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n + (n-1)^ + (n-2)^ + (n-3)^ +... +(n-k)^ + ...+ 2+1= n(n+l)(2n+l)/6. 

To compute the complexity of binary comparisons we need to consider 
the number of elements in the matrix. Each matrix of size (n-k)x (n-k) con- 
tains (n-kf binary elements. Because the matrixes are antisymmetric we 
need to compare only (n-k) + (n-k-1) + ...+3+2+l=(n-k+l)(n-k)/2 binary 
elements. Combining the number of matrixes of each size, {n-kf and the 
total number of elements in each matrix, (A:+l)^will give us the total number 
of binary comparisons: 

n*f+{n-\) +{n-2) ^f+. ..+{n-k) \k+\) ^+...+2 \n-2f +. . .+1 \n-\f . 

To estimate this number of comparisons we note that each element is 
smaller than n^ and there are n elements in the sum, so the sum is less then 
n^. Thus, the number of binary comparisons is O(n^). 

This polynomial complexity can be significantly reduced by converting 
each matrix Aa and Ab to the special linear structures with n elements as we 
discuss in section 3 below. Similarly to the previous consideration it is 0(nf 
for linear structure comparisons, but it requires less binary comparisons. For 
submatrices of size sx 5 it is 5 instead of Thus, the total complexity is 
0(n^) in contrast with O(n^) shown above. Generation of the special linear 
stmcture itself requires nlgn binary comparisons for sorting, so that does not 
change the total complexity, that is 0(n‘*). 

Another important computational issue is the fact of monotonicity, that is 
if none of the common submatrices of size sx s are found then there is no 
reason to search for common submatrices of a larger size. These submatrices 
can not be shared by Aa and Ab. Thus, the worst-case complexity often is not 
the case. 

2.3 The linear feature as an invariant 

Theorem 1 sets up a framework for using co-reference candidates (CRC) 
as algebraic invariants for conflation. In this section, we describe a simpli- 
fied simulation procedure to identify the scope for such invariants. The in- 
formal hypothesis is that if an image contains linear features with significant 
variation of angles then the CRC will work better than for images containing 
smaller variations. A similar hypothesis can be made about the lengths of 
linear intervals in a feature — if lengths vary significantly (e.g. one interval is 
double the size of the next) for a given feature then the co-reference candi- 
dates extracted would provide a better reference between images. Testing 
these hypotheses means studying robustness of the algebraic invariants. 
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Analysis of robustness. Let us be given a sequence of connected linear 
intervals ai = [vo, Vi], a 2 = [vi, V 2 ], ... , a„= [Vn-i, v„] in an image which 
forms a linear feature, Figure 4. 




We define and compute a relation P on this feature as follows: 
Let Pi = P(ai , ai+i) where 
P(ai , ai+i) = 0 O D(ai) < D(ai+i) and 
P(ai, ai+i) = 10 D(ai) > D(ai+i). 



This relation can be represented as a binary vector, t = (Pi, P2, P3, ... , Pk, 
... , Pn-i). For example, consider a specific collection of linear intervals that 
generate a vector ti = (1, 0, 1, . . . ). Note that this vector contains informa- 
tion about the relative lengths of successive intervals. For example, ti states 
that ai is no shorter than a 2 while a 2 is shorter than aj,. Denote the i* compo- 
nent of ti as tii . 

Next, for the purpose of simulation, suppose we apply non-linear trans- 
formations to the points that comprise each of the ai intervals. Let a vertex 
V = (x, y), then the transformed vertex v will be transformed component-wise 
as 



V = (f(v, y), g(x, y)). 

The relation P would then be recomputed for the transformed intervals. 
Let t2 = (P'l, P'2, P'3, ... , P'k, ... , P'n-i) collect the results of the transformed 
relation. The vector t2 can then be compared with the vector ti by computing 
the Flamming distance FI between these vectors: 

FI(ti , t2)=Ek=l,n-l(tlk-t2k)^. 

This distance measures the number of distortions produced by transfor- 
mations f and g in relation P. If the distance is 0 then the feature is a P- 
invariant for the (f, g) transformation. 

Consecutive angles can be treated in a similar way. 

Let Qi = Q(ai , ai+i, ai+ 2 ) where 

Q(a; , ai+i, ai+ 2 ) = 00 L(ai , ai+i) < L(ai+i, ai+ 2 ), 

Q(ai , ai+i, ai+ 2 ) = 1 4^ L(ai , ai+i) > Z,(ai+i, ai+ 2 ) ^nd 

t = (Qi, Q2, Q3, ... , Qk, ... , Qn-2). 
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As before a transformed vector f can then be compared with a vector ti 
by computing the Hamming distance H: 

H(ti,t2)=Ek=l,n-2(tlk-t2k)^- 

Again, this measures the number of distortions in the angles produced 
by transformations f and g in relation Q and if H is 0 then the feature is 
called Q-invariant for the (f, g) transformation. Note, all angles are meas- 
ured consistently. 

2.4 Similarity measures 

Common problems with polylines are discontinuities that result from im- 
age resolutions, differences in image acquisition, and artifacts of feature ex- 
traction algorithms. Extracted features can be modified in two ways to give 
a common resolution complexity to facilitate comparisons. 

Consider the case of a curvilinear feature that is segmented as a result of 
something obscuring it, such as a road shaded by a tree. By connecting seg- 
ments that are “close enough” and have “small” deviation angles, a compos- 
ite feature can be formed. The maximum separation distance and the maxi- 
mum deviation angle parameters permitted are clearly critical to feature 
creation and to one’s confidence in the result. 

Another type of feature modification is necessary to simplify curvilinear 
features with one or more relatively narrow lobes for comparison to one with 
no narrow lobes. For example, a state road map will typically depict a coast- 
line with very little stmcture. A high-resolution aerial photograph, on the 
other hand, will show the same coastline with lots of structure, showing that 
it goes inland for miles along river channels and juts out around spits of land. 
The higher resolution feature can be simplified by removing these lobes if 
they are “sufficiently narrow.” Critical parameters here are the unit size of 
feature sampling and maximum jumping distance permitted. 

With images prepared with this preprocessing, it is possible to register 
images that appear at first to have few if any features in common and are of 
unknown scale and orientation. 

Measures of spatial similarity of polylines also need to be developed. 
Here, the focus is on spatial similarity characteristics while the similarity of 
non-spatial feature attributes can be matched after a spatial match is con- 
firmed. If two images are matched using only a few reference points, the 
similarity of other points also needs to be assessed. 

The issue of variability of points that form a polyline also needs to be ad- 
dressed. Different feature extraction algorithms and imagery analysts can 
assign points differently on the same physical feature. This can affect finding 
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co-reference candidate features. The technique described below addresses 
this problem in a computationally efficient way. 

Consider the polyline in Figure 5 and the successive approximations de- 
fined by taking points defined by the mid point of lengths along the polyline 
as indicated. The method is called Binary Sequential Division (BSD) 
method computed by finding a curve middle point along the curve, then re- 
peated for each half, halves of halves and so on. 




Figure 5. Structural interpolations of a polyline. Our experiments show that 8 binary 
sequential divisions with 256=2* linear segments is typically sufficient for interpolation. 

More formally a function G can be defined as follows for its first three 
values: 

G(2») = G(l)= [pi, p2]; 

G(2‘) = G(2) =[pl, middle(pl, p2), p2]; 

G(2^) = G(G(2'’)) = [pi, middle(pl, middle(pl, p2)), middle(pl, p2), 
middle(middle(pl, p2), p2), p2] 

In general, notation G(n) will be to denote «-th interpolation of the 
polyline and notation S(n) will be used for the strueture of polyline G(n). 
Together the pair <G(n),S(n)> is called a structured polyline G(n). Next, 
we can define the concept of the structure S(n) of the polyline using alge- 
braic concepts for imagery conflation problems. For general polylines we 
use the pair of matrixes, A and L that represent relations between angles 
(matrix A) and lengths of intervals (L) in the polyline. These matrices are 
called structural matrices. For each G(n) polyline, the relation between 
lengths are trivial - all of them are equal by the G(n) definition. Table 3 pro- 
vides an example of the structural matrix for angles for the original polyline 
depicted in Figure 5. For instance, bold 0 as a relation between Angle 1 and 
Angle 2 indicates that Angle 1 < Angle 2. 
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Table 3. Structural matrix A for polyline a 
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Angle 2 


Angle 3 


Angle 4 


Angle 5 


Angle 1 
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Angle 2 
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0 


Angle 3 
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Angle 4 
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0 
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Angle 5 
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0 


1 



Similarly we can construct matrixes A(n) for each polyline G(n). The ma- 
trix shown in Table 3 only reflects the upper level structure. For more de- 
tailed structure we may include in structure S more matrixes that reflect 
more specific structural properties, such as relation between differences be- 
tween angles and relations between second, third and so on differences be- 
tween angles, similar to second and third derivatives. 

Two structured polylines Gfn) and Gfn) are structurally equivalent if 
their structures Sfn) and Sfn) are the same, that is there is a isomorphism 
between structures. If we restrict the structure by the matrix A shown in Ta- 
ble 3 then the equality of structures means the equality of such matrixes for 
two different polylines. 

Now we can discuss how to define measures of structural similarity be- 
tween two arbitrary polylines a and b and use these definitions for matching 
features and conflating images. 

Definition. Two polylines a and b are ^-structurally equivalent if 

Sa(n) =Sb(n). (1) 

Definition. We say that there is a monotone ^-similarity between poly- 
lines a and b, if for every n’<n property (1) is true, i.e., Sfn ’) =St(n ’). 

Definition. The number n is called the measure of structural similarity 

between polylines a and b if (i) similarity between a and b is monotone and 
(ii) n is the maximum of all n’ for which (1) is true. Such n is denoted as 

f^max- 

There is no structural equivalency for n’ greater that n^ax: 

V « ’ > n^ax Sa(n) ^St(n). 

The following definition describes the concept of stable structure of the 
polyline. Let Ue be a polyline obtained from polyline a by extending or cut- 
ting the end intervals of the polyline (see Figure 5). Index e at ae indicates 
the added or deleted length of polyline a. Now we can produce structured 
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interpolations <Gae(n), Sae((n)> for 3e and measure similarity between a and 

Ue- 

Definition. Polyline a has (ae,n)-stable structure if the ^-similarity be- 
tween a and ae is monotonic. 

For matching purposes we use sets of n-stable polylines, say {a} and 
{b}, in both images, We search among them for those features that have the 
same structure. If such feature do not exist then we search for subpolylines 
of polylines in {a} and {b} that have the same structure. If such features do 
not exist then we search for pairs of features with highest measures of struc- 
tural similarity introduced above as the measure of monotonic n-similarity. 

G(n) is used to denote the «-th interpolation of a polyline. As mentioned 
above n=2^, and k is called the BSD level. The first four steps of the confla- 
tion process are: 

Step 1. For raster images extract several linear features as sets of points 
(pixels), S. For vector images skip this step. 

Step 2. Vectorize extracted linear features. For vector images skip this step. 
Step 3. For both raster and vector images analyze the complexity and con- 
nectivity of vectorized linear features. If features are too simple (contain 
few points and are small relative to the image size) combine several fea- 
tures in a superfeature. If features are too complex, simplify features by 
applying a gap analysis algorithm. In the ideal situation we also should 
be able to separate feature extraction algorithm artifacts from real fea- 
tures. In the example here, the algorithm introduced artifacts, by captur- 
ing vegetation as a part of the shoreline in several places. 

Step 4. Interpolate each superfeature as a specially designed polyline using 
the BSD method. 

Figure 6 depicts level 1 BSD interpolations with k^\ and n=2 for the vec- 
torized features shown in Figure 3. The middle points of each feature are as 
shown, computed along each line. Significant fluctuations have been lost in 
the lower resolution image. Feature M as interpolated has angles Ai, A2, and 
A3. Feature L as interpolated has angles Bi, B2, and B3 

Figure 7 depicts the next level of BSD interpolations for the same vector- 
ized features from Figure 3. 
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Figure 6. Sections of the extracted shorelines with the first level BSD interpolations with k=l 

and n=2 




Figure 7. Fragment of BSD level 2 for the two polylines 

We also developed a modification of the method that is based on 
lengths that permits to measure a structural difference in lines in the follow- 
ing way: 

Step 1 : Compute L, the length of the line from its start to the middle point M 
along the line. 

Step 2: Compute the lengths of two “shoulders”, SI and S2, that is SI the 
length of the straight line [T, M] between the start point T and the mid- 
dle point M. Similarly compute S2, the length of the straight line [M,E] 
between the middle point M and the end point E. 

Step 3. Compute ratios R1=S1/L and R2=S2/L, where L is the half of 
length of the feature computed along the feature. If a ratio R1 is close 
to 1 then the first part of the feature is close to the straight line, similar 
with R2. If both R1 and R2 are significantly larger than 1 and have 
similar values, say Rl= 10.5 and R2=11.4 then on the first level of 
structural similarity features FI and F2 are similar. 
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Step 4. Repeat steps 1-3 for subfeatures [T, M] and [M, E] recurrently for 
theirs subfeatures until all ratios will be equal to 1 (straight line). 

For any polyline with n nodes it is required to repeat step 4 no more than 
n\gn times to get all ratios equal to 1 . 

This can be shown by considering an extreme case (EC) : where after the 
finding a middle point M only the single end points will be in the right 
“shoulder.” This means that the n-1 nodes are in the part between the start 
node T and M including node pn_i. 

We know that the polyline between node pn-i and the end node E= pn is a 
straight line [pn-i, E]. Above we also assumed that M is between them, thus 
[M, E] is a straight line and a part of the longer straight line [pn-i, E]. That 
is ratio R 2 is equal to 1 for [M, E] and Step 4 took only one iteration for the 
subfeature between M and E. 

Now we assume that the same extreme case (EC) is true for the left sub- 
feature from T to M and subsequently for all its subfeaures. That is we need 
to repeat step 4 only n times for this extreme case. 

If our previous assumption is not true and we have more than one node in 
the subfeature between M and E, we can repeat such binary search process at 
most for Ign times to come to the situation with a single straight line. Having 
total n nodes it may take «lg« loops with Step 4 in the worst case. 

We visualize the structural length type of the feature related to lengths in 
Figure 8. 




Figure 8. Illustration of structural lengths. See also color plates. 



Figure 8 shows that ratios Ri and R 2 on the first level were basically the 
same. On the second level it is the same, but on the third level the right 
shoulder is much larger than the left shoulder. In the fourth level the right 
part is symmetric, but left part is not. The last level 5 also provide a mix of 
symmetric and asymmetric cases. 

2.5 Generation of matrices 

Matrices are constructed from the angle relationships and from the length 
relationships of the polylines by using two algorithms, denoted as the Angle 
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Algorithm (AA) and the Shoulder Algorithm (SA). Matrix computation 
form steps 5 and 6 of the conflation algorithm: 

Step 5. Compute a matrix Q of the relation between all angles on the poly- 
line by using AA Algorithm. 

Step 6. Compute a matrix P of the relation between all lengths of intervals 
on the polyline by using SA algorithm. 

Values of 0 and 1 are used to indicate > and < respectively. For this ex- 
ample, the angular relations for the two polylines are presented in Table 4 
and the length (or shoulder) relations are in Table 5. 



Table 4. Angular relations for A and B 





A1 


A2 


A3 






A1 


A2 


A3 






B1 


B2 


B3 






B1 


B2 


B3 


A1 


A1>A1 


AKA2 


A1>A3 


A1 


1 


0 


1 


B1 


B1>B1 


BKB2 


B1>B3 


B1 


1 


0 


1 


A2 


A2>A1 


A2>A2 


A2>A3 


A2 


1 


1 


1 


B2 


B2>B1 


B2>B2 


B2> B3 


B2 


1 


1 


1 


A3 


A3<A1 


A3<A2 


A3> A3 


A3 


0 


0 


1 


B3 


B3<B1 


B3<B2 


B3> B3 


B3 


0 


0 


1 



Table 5. Length (shoulder) relations for S and T 





SI 


S2 


S3 






SI 


S2 


S3 






T1 


T2 


T3 






T1 


T2 


T3 


SI 


S1>S1 


S1>S2 


S1>S3 


SI 


1 


1 


1 


T1 


T1>T1 


T1>T2 


T1>T3 


T1 


1 


1 


1 


S2 


S2<S1 


S2>S2 


S2<S3 


S2 


0 


1 


0 


T2 


T2<T1 


T2>T2 


T2<T3 


T2 


0 


1 


0 


S3 


S3<S1 


S3>S2 


S3>S3 


S3 


0 


1 


1 


T3 


T3<T1 


T3>T2 


T3>T3 


T3 


0 


1 


1 



Table 6. Matrices for features from images A and B for angles marked 1,2, and 3. 
Bold numbers indicate differences between angular relations in two features. 





1 


2 


3 


Image A: angles 
1: 2.206752 
2: 2.389911 
3: 2.797306 




1 


2 


3 


Image B: angles 
1: 2.906888 
2: 2.467343 
3: 2.702809 


1 


1 


0 


0 


1 


1 


1 


1 


2 




1 


0 


2 




1 


0 


3 






1 


3 






1 



The tables above only show matrixes for BSD level 1. In the general case 
for BSD level k there are 2^-l=n-l segments and each matrix generated has 
size (n-l)x(n-l). Table 6 shows part of the BSD levels 2 for the same fea- 
tures. 

Another way to compare interpolated polylines is to construct a matrix of 
shoulder/length ratios S/L, where the length L is computed as a distance be- 
tween the shoulder end points along the polyline. We use two relations: 
Si/Li>Sj/Lj and | Si/Lj - Sj /Lj |<s. In the matrix for relation Si/L;>Sj/Lj all 
diagonal cells (/=/') are equal to 1 because Si/Li>Si/Li is always true. Also 
Si/Li=l, since by definition for the base line Si=Li. 
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Next, L2 = L3 by the middle point design, thus S3/L3 > S2/L2 is true if and 
only if S3 > S2. We also know by definition that 1> Si/Li, because Si as a 
straight line is shorter or equal to the curve Li between the same points. This 
analysis shows that relations for ratios S/L are the same as for S. But ratios 
can provide more relations, for instance we may discover that 2*S2<S3 , i.e., 
S2 is twice smaller than Si. 

2.6 Testing data closeness 

Once the matrices have been generated, measures of closeness may be 
calculated for each polyline in one image with each polyline in the other. 
The AA and SA algorithms make these comparisons. These comparisons 
form steps 7 and 8 of the Conflation Method: 

Step 7. Testing closeness for angles. 

Step 8. Testing closeness for lengths. 

In the example. Tables 4 and 5 show that matrixes for two features are 
identical in both images. This means that we have the highest closeness of 
given features on BSD level 1. However, Table 6 reveals that matrixes for 
BSD level 2 are different and there is no match on this level. If there are suc- 
cessful matches, higher BSD levels are explored until the match fails. A 
match level tree that shows the deepest match level reached for each section 
on each BSD level can be used to make a final judgment about feature 
match. 



Table 7. Angle comparisons using thresholds 







1 


2 


3 


Level 2 BSD angle comparison matrix: image A (columns) and 
image B (rows) with threshold = 0.209440. Note, 1 indicates 




1 


1 


1 






1 


the difference in the column and row values is greater than the 




2 




0 


1 


threshold, | Li - Lj |> e. This matrix indicates that Level 2 BSD 
match fails. There is only one value in the threshold limits. 




3 






0 



The set of relations computed by the AA and SA algorithms in practice 
also include thresholds to make the methods more robust to noise and multi- 
resolution mismatch. This is illustrated in Table 7. 

2.7 Match forecasting and evaluation 

The goal of Step 9 is to identify the deepest level of structural match for 
two features. The brute force version of this step is to repeat steps 7 and 8 for 
all deeper BSD level k+\, k+2 and so on. The more efficient version of step 
9 described below first forecasts feature match on the next BSD level and 
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computes the next BSD level only if the forecast is successful. Otherwise, 
the forecasting algorithm explores potential match for halves of the feature 
(shoulders). If it fails too then the features are cut down as described in Steps 
10 and 11. Step 9 contains the following substeps: 

Step 9.1. If Steps 7 and 8 are successful on BSD level k, forecast potential 
match in the next BSD level k+1 using forecasting algorithm FA (see 
algorithm described below). 

Step 9.2. If match is forecasted by FA algorithm for BSD level k+1, repeat 
step 9 for BSD level k+1 until mismatch is forecasted. If mismatch is 
forecasted by FA algorithm for BSD level k+1, use FA algorithm for 
match forecasting for respective halves of the features (right and left 
shoulders). 

Step 9.3. Construct a match level tree that shows the deepest match level 
reached for each section on each BSD level. 

Step 9.4. Evaluate the tree to make a final judgment about feature match. 

2.7.1 Match prediction using SA algorithm 

Algorithm AA for BSD level A=0 provides us the lengths of three curves 
Li, L 2 and L 3 (see Figure 5), where Li is the base line connecting two ends of 
the feature curve, and L 2 =L 3 by the definition of the line middle point used 
to built them. Similarly SA algorithm for h=Q produces three straight lines 
(shoulders) Si, S2 and S3 connecting two ends and the middle point of the 
feature curve (see Figure 6 ). Similarly, for every other k, AA and SA algo- 
rithms provide a set of Li for each polyline segment along the curve and a set 
of shoulders Si between those segments (see an example in Figure 7). 

Next we compute all Si /Li for BSD level h=Q for image A and similar ra- 
tios for image B, denoted as Ti /Mi in Figure 6 . Say these ratios are 1, 0.5 
and 0.8 for image A and 1, 0.3 and 0.83 for image B. 

After that we search for a pair of shoulders <Si/Li, Ti/Mi> such that 
I Si/Li -Ti/Mi|> e in the two images. If shoulders are found, this is declared a 
mismatch forecast. If there are no such shoulders, the FA algorithm forecasts 
a potential match on the next level and the Conflation Method proceeds to 
compute BD level k+\. 

In the example above, |S2/L2-T2/M2|=|0.5-0.3|> e, if e=0.1. Thus, means 
that the CM method will not compute BSD level k+\=2 for the whole feature 
but will compute BSD for the third section of the curve with 
IS3/L3 -T3/M3|=|0.8-0.83|<0.1. 

The motivation for this algorithm is the following. The significant differ- 
ence in ratio values between two features is indicative that these differences 
will show up on a deeper level of BSD and features would not match at 
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higher levels. However, we cannot say at what higher level it will actually 
happen. The probability to get match on the next BSD level is lower for 
shoulders with very different ratios than for shoulders with similar ratios. 
Our simulation experiments confirm this. It is also consistent with an expec- 
tation that the probability of a high level of match (A:>4) for images with sig- 
nificant noise and different resolution is low. Thus, we cannot expect many 
of these very good cases and having the majority of low-level matches the 
FA algorithm will significantly shorten computation time. 

2.7.2 Cutting units from features 

The next step is to explore the situation when a feature match is not 
found. In this case, a search for matching sub-features is initiated by cutting 
a predefined unit from the first superfeature and repeating, keeping other 
superfeatures unchanged. Currently we use a unit size between 1/200 of the 
image largest measurement and half the size of a large feature. It is sug- 
gested to start from large units to save time. 

If no match is found, the process is repeated by sequentially cutting units 
from all superfeatures until a match is found or the superfeatures are gone. 
This process forms steps 10 and 1 1 of the conflation algorithm: 

Step 10. If Steps 7 and 8 fail cut a predefined unit from the first superfeature 
and repeat steps 4-9 for this modified superfeature and unchanged other 
superfeatures. 

Step 11. If match is not found repeat step 10 by sequentially cutting units 
from all superfeatures and until match will be found or superfeatures 
are completely cut down. 




Figure 9. Step 9: Successful match of vectorized features and original image. 



Figure 9 shows the final conflation result for the example discussed. 
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Optimizing unit size. The unit size u used in steps 10 and 1 1 is the most 
critical parameter of the whole CM method. If the unit size is equal to the 
length L of the largest superfeature then steps 1 0 and 1 1 are empty. If the 
unit is a half of the L, m=L/ 2 then at most step 1 1 will be repeated two times 
for each superfeature, that is total nxm times, where n is the number of su- 
perfeatures in image A and m is the number of superfeatures in image B. Our 
experiments show that three superfeatures per image were sufficient to find 
the match. Thus, nxm could be bounded by 9 times of running step 11. If 
each superfeature contains r units then step 1 1 will be run (rn)x(rm)= fnm 
times, if there is no optimization of time. This is polynomial time complexity 
function. The combination of AA and SA algorithms provides such time op- 
timization by cutting time of running step 1 1 by half 



3. FEATURE CORRELATING ALGORITHMS 

We begin this section by presenting a simplified version of an algorithm 
that finds a maximum co-reference. In essence this algorithm finds the larg- 
est n-gram common for two Boolean vectors ti and t 2 . This common n-gram 
cannot be longer than min(ni,n 2 ) - H(ti,t 2 ). There are several known algo- 
rithms for finding n-grams with a finite alphabet. In our case, the alphabet is 
a smallest possible — {0,1}. Thus, the search is can be relatively fast. 

At times, it may be necessary to consider more than consecutive inter- 
vals. Using a notation similar to the previous section, let us denote the angle 
between aj and ai+i as Li = L(ai, ai+i). Now record the comparisons between 
all pairs of angles in a given feature in a matrix. As above, record the i,j en- 
try as 0 iff Li < Lj and 1 iff Li > Lj. Note the resulting matrix is symmetric. 
Now when two such matrices are built for two features a and b, a more com- 
plex approach in determining a maximum co-reference is based on searching 
for the largest common part these matrices. This is done by “sliding down” 
the diagonals of the matrices. Consider the example in Tables 1 and 2 where 
the largest common parts are highlighted. We will refer to this Matrix 
Matching Algorithm as MMAl. Later we will also examine an improved 
Matrix Matching Algorithm, MMA2. 

Next we will define a concept used in representing new Linear Structure 
Algorithms which we will refer to as LS 1 and LS2. 

Definition. An algebraic system LS=<I,R,C2i> is called a linear strue- 
tnre for featnre A if 

(1) / is the set of integer indexes of elements a, from feature A according to 
their order >„ in A. That is/=|l,2,...n}. 

(2) The signature Qi = {> cd , >d ,<p, w) contains two relations >cd , >d , and 
two sorting functions (p and ij/ that are defined on / such that: 
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V ij from 7: ^(/) >cd fij) 4 ^ D{a ,) >D(aj) and 

V i,J, k, m from/: yr{i,j) ^cd m) <f> /,(«,, ay) >L(ak, a„). 

The function cp shows the results of sorting intervals according their 
lengths and i// shows the results of sorting angles between adjacent intervals 
according to their magnitude. 

Example: The sequence of lengths of intervals ay, a2, as, a^ is 0.1m, 
4.3m, 7.6m and 0.5m. Their indexes can be written as follows {1, 2, 3, 4} 
then by sorting the lengths in descending order, the stmcture’s sequence can 
be represented by the reordering of the original indices thus: {3, 2, 4, 1}. 

In formal terms, this means that we have a substitution (permutation) (p 
for indexes of lengths such that 



fl234^ 

(p = 

t324lj 

A similar index substitution function is produced for angles (i.e. for pairs 
of linear segments with indexes [i, j] and [k, m]). 

3.1 The Matrix Matching Algorithms 

In this section, we begin by presenting a detailed analysis of MMAl 
summarized above. We then analyze an improved Matrix Matching Algo- 
rithm MMA2. 

3.1.1 An exact complexity formula for MMAl 

It has been shown above that an upper bound on the complexity of 
MMAl is 0(n^) but the exact formula was not provided [Kovalerchuk & 
Sumner, 2003]. The general bound is: 

nh^+{n-\) +{n-2) ^3^+... +2 \n-\f+\ \ ^ . 

We will now demonstrate an exact representation for the complexity of 
MMAl. 

Theorem 1: The complexity of MMAl when applied to nxn matrices is 
given by 

— [2n^+5n'^-5n^-2n]. 

120 
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Proof. Let SI and S2 be nxn comparison matrices that share a common 
subsystem. As mentioned in [2, 3], the matrixes being considered are anti- 
symmetric so that only upper triangular comparisons are required. Thus, 

{n-k) +{n-k- 1)+... +3+2+1 =(n-k+ 1 ){n-k)/2 



binary comparisons need to be made for each subsystem. Thus, the first 
terms of each item in the sum can be replaced by {n-k+\){n-k)/2, which leads 
to the following representation for the full system: 

{n{n - 1 )/2) * 1 ^+((«- 1 )(«-2)/2)*2^+ . . . + 1 * 0* («)^. 



We can represent and then manipulate the equation as follows. 
'^k\n-k + \){n -k)l2=-^{k^-{2n + l)k^ + {n^ + n)k^) = 

k=i 2 



2 

1 

2 



( 1 + 



k=\ 




— {2n + 1)^ + {n^ + = 

k=\ k=\ 



5 



+ 



1 

— n 

2 



4 




1 

30 



n) - 



(2n + \){-n^ + -n^ +-n^) + 

4 2 4 

+ n){ — n^ + —n^ + — «)] = 

3 2 6 

(2«^ —5n^ — 2n). 

120 

Thus, we have shown this is indeed 0{n). For comparison purposes as 
we proceed, note that for n = 10 this is 2079. 

It is interesting to note that for small n (n<60) this equation behaves 
more like 0{n) due to the small leading coefficient. However, it is not gen- 
erally unusual to have n > 60 segments in a feature. 



3.1.2 The improved algorithm MMA2 

The performance of MMAl can be improved further using the principle 
of monotonicity. In this method, the largest potentially matching matrices 
are searched first. If the matrix match in progress should fail at a given loca- 
tion, we then continue from where the match was last successful. Thus, we 
are able to continue looking for a smaller match originating from the same 
starting positions without repeating work already performed. 

In [Kovalerchuk & Sumner, 2003], the issue oi monotonicity is presented 
by noting that the worst-case is often not what occurs. That is, if no common 
matrices of size sxs are found, then there is no reason to search for larger 
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common submatrices. Now we have chosen to take a reverse viewpoint, 
while still utilizing the same basic principle. We shall attempt to verify the 
largest subsystems first from a given set of starting positions in the matrices. 
If during comparisons, elements are found that do not match, the entire 
match is not abandoned, the match is just scaled down to include the last 
matching elements. In this way, the principle of monotonicity can be com- 
bined in a dynamic programming approach that reduces the complexity in- 
volved. 

This can be represented by the equation 
^k(n-k)(n-k + V)/2 = 

k=l 

i[(n^ + k-{2n + l)f^k^+f^k^] = 

^ k=\ k=\ k=\ 

—\{n^ +n)(—n^ H — n) 

2 2 2 

-{2n + \)(}-n^ + — n^ + —n) 

3 2 6 

i/^ 4.^ 3.^ 2\ 

+ {—n H — n H — n ) = 

4 2 4 

+ 2n^ —n^ —2n)l2A . 

We have shown the following theorem. 

Theorem 3.- The complexity of MMA2 when applied to matrices is 
0{ri^), and is given by 

+ 2n^ —n^ —2n)l2A . 

Again as an example, consider « = 10 which yields 495. 

3.2 The Linear structure algorithms 

In this section, we present two versions of the Linear Stracture algorithm 
LSI andLS2. 

3.2.1 The Linear Strueture algorithm, LSI 

When setting up a comparison matrix, the values of the table reflect 
whether the value of element x is greater than or equal to or less than the 
value at element x+1, x+2, x+3, and so on. In deriving a program to model 
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all possible combinations of values that also satisfy these same relationships, 
we found the best method was to first sort the values, keeping track of the 
indices from which they came, set up a new group of values from lowest to 
highest, then distribute them back into the proper order as per the original 
indices. That is, the order of the original indices after sorting fully describes 
the comparison matrix. Given this order, it is a simple matter to reconstruct 
the matrix. In fact, this order itself can be used to compare two features in a 
way in which you do not need the comparison matrices at all. 

This is what we call the linear structure’s sequence. The following ex- 
ample illustrates this concept. 

Let us use the length measurements for feature SI: {5, 2, 8, 9, 4}, which 
have the comparison matrix shown in Table 8. The values 0 and 1 are used 
to represent x < y and x > y respectively, where x is a row value and y is a 
column value. 



Table 8. Comparison of lengths for feature SI 



Length 


5 


2 


8 


9 


4 


5 


- 


1 


0 


0 


1 


2 




- 


0 


0 


0 


8 






- 


0 


1 


9 








- 


1 


4 










- 



Now, we also have S2: {8, 9, 4, 6, 7} (see Table 9). The matching seg- 
ment is highlighted in both Table 8 and Table 9. 

Table 9. Comparison of lengths for feature S2 



Length 


8 


9 


4 


6 


7 


8 


- 


0 


1 


1 


1 


9 




- 


1 


1 


1 


4 






- 


0 


0 


6 








- 


0 


7 










- 



Let us build the index sequence for Table 8. This table depicts the set of 
lengths of intervals {5, 2, 8, 9, 4} and their relationships. The structural se- 
quence for this table will begin with the value 2, which is the index of the 
smallest element (3). The next value 5 indicates the second smallest element 
is from index 5 whose value was 4. This process continues and the final se- 
quence of the indices in SI is {2, 5, 1, 3, 4}. The sequence of the values by 
index in S2 is {3, 4, 5, 1, 2}, and is derived from Table 8 in a manner similar 
to S 1 . At first it seems that, there is no similarity, but when the first two ele- 
ments of S 1 have been stripped off and the values ranked again, we find the 
SI LS-sequence to be {3, 1, 2}. Further, when the last two elements of S2 
have been stripped off, we find the S2 LS-sequence to be {3, 1, 2} also. We 
have found the matching segments. All that is necessary is to delineate all 
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sequential element segments, convert them to linear stmctures then compare 
these sequences with other LS-sequences starting with the longest segments 
first until we find an exact match. Each sequence will require sorting which 
can be done in 0(n log n) using, say, a mergesort algorithm [Neapolitan, 
Naimipour, 1998], and the overall total will involve the addition of all com- 
binations. 

This can be represented by the formula: 

^ -I- 1 - k)k log(A:) 

k=\ 



and since log k < log n we have 

'^{n + \-k)k log(«) = n log(«)^ k + log(n)^ k - log(n)^ k^ = 

k=\ k=\ k=\ k=\ 



1,1 1,1 

n\og,{n){—n + — n) + \og{n){—n + — n) 



— log(n)(— H — -\ — n) = 

3 2 6 

2 3 , 2 1 1 1 

— n log n + n log ^ ^ ^ ^ • 



We have shown the following result. 

Lemma 1. The complexity of generating LS-sequences, which is 
O {n^ log n ) , is given by 



2 

3 



\ogn + n^ \ogn + 



3 



nXogn . 



This process of generating sequences can be performed for each feature 
before any comparisons between features are made because it requires no 
information from any other feature. It needs to be noted that mergesort or 
another stable sorting method is preferred to a sort such as quicksort here 
because of the importance of preserving the correct order in the event that 
two equal values are found. 

We can now analyze the complexity of the actual comparisons between 
features. 

• For each sequence of length n-k+\ a comparison must be exact for the 
entire matching section. This requires n-k+\ comparisons. Thus, this is 
a simple 0(n) operation. 

• The number of subsequences of length n-k+\ produced from a sequence 
of length n is k. 
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• Each of k subsequences of length n-k+\ generated in one feature must 
be cross referenced with each sequence of the same length in feature 2, 
which will be O(k^). 

This can be represented by the equation: 

'^k\n-k + \) = {n + \)^k^ = 

k=\ k=\ k=\ 

{n + \){—n^ + + — n) 

3 2 6 

-{—n +—n +—n) = 

4 2 4 

+ 4n^ + 5n^ + 2n)H2 . 

We have shown the following theorem. 

Theorem 3. The complexity of LSI when applied to nx« matrices is 
O(n^) and given by 

(«" + 4n^+5«"+2n)/12. 

As noted, this is O(n^) which is similar in complexity to the MMA2 
method. However, it is not quite as efficient as the MMA2 method. Consider 
for example « = 10. Here the value is 1210. 

3.2.2 The Linear Strueture algorithm LS2 

We can improve the performance of the LSI method by incorporating 
some additional storage and sorting of the linear structures created. If linear 
structures of same length sequences are stored in a sorted order (e.g. a binary 
tree) then binary searching of this information for the exact match would 
require only 0(log n) time instead of 0{n). The ordering does take extra 
time, amounting to 0(nlog(n)), but it can be done off-line, i.e. Before com- 
paring two features. This modifies the previous analysis for LSI method. 
The LSI method has a smaller “off-line” part. Below we estimate the com- 
plexity of the “on-line” part of the method LS2. We define the “on-line" part 
the method that is applied for comparison of two features from two images. 

We have k sequences of length n-k+\ in each feature FI and F2 to be 
compared. Let’s take an arbitrary sequence, say (1,2,5,4,...) generated for 
feature F2 with n-k+\ elements. To find a match for this sequence in an or- 
dered set of sequences for feature FI, we need log(A:) comparisons of se- 
quences, because we have k such sequences. 

Next we have n-k+\ elements in each sequence, and we need to compare 
them with the same n-k+\ elements in another sequence element by element 
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for each position, that is n-k+\. Thus, for a single sequence, say (1,2,5,4,...) 
we need {n-k+Y)\og k individual comparisons of values (indices). 

Now we notice that there are maximum k different sequences of length n- 
k+\, that is the total number of comparisons is: {n-k+\)*k*\o% k. Now, we 
derive the formula for all k\ 

'^{n-k + \)k* \o^{k) = ^{{n + \)k-k^)* \o^{k) = 

k=\ k=\ 

{n + l)^k\og{k)- '^k^Xogik) = 

k=\ k=\ 

Since always k < n, we can substitute log(w) for log(A:), which yields: 

{n + l)^k\og{n)- '^k^\og{n) = 

k=\ k=\ 



{n + \)\og{n)^k- \og{n)^k^ = 



k=\ 



k=\ 



{n + l)\og{n){^n^ 



— \og(n){—n^ H — H — n) = 

3 2 6 

log(n) * {n^ + + 2n) / 6 



Thus, we have an upper bound for our analysis of 

1 3 1 2 1 

— n \og{n) + — n \og{n) + — n\og{n) 



Now we have demonstrated the following. 

Theorem 4. The complexity of the “on-line” part of LS2 method when 
applied to two linear features with nx-n matrices is 0{rr’ \ogn) . 



Table 10. Comparison of complexities of methods 



N 


10 


20 


40 


80 


MMAl 


2079 


59983 


1813266 


56319732 


MMA2 


495 


7315 


111930 


1749060 


LSI 


1210 


16170 


235340 


3586680 


LS2 


731 


6656 


61096 


559870 



Table 10 shows comparison of complexities of four methods presented 
above for some typical n values used in linear feature encoding. 

For cross-referencing more than two features, this LS2 matching process 
would need to be extended. We can build this extension using the same 
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method of storing all linear structures with equal lengths sequences in a 
sorted order. 

For n = 10 we have 1*10 + 2*9 + 3*8 + 4*7 + 5*6 + 6*5 + 7*4 + 8*3 + 
9*2 + 10*1 =165 segments. These segments can then be found in - n)l6 
or 0{n^) time. 

If we let m be the total number of features, and we assume for this case 
that all features are of length n, then to compare to all other features we will 
have 0{mf log n). 

The complexity of the LS2 has shown to be the best for processing effi- 
ciency. We have not, however, taken into account the storage requirements 
for maintaining the list of sequences which is 0{n^). 

An image with 1000 polylines of 100 points each would require 
1000*100*101/2 storage locations. Since all the indexes in this case are < 
256 we could use a single byte of storage apiece, resulting in a storage re- 
quirement of 5,050,000 or 5 Mb, which is quite reasonable. 

Of the methods investigated, we have found the modified linear structure 
method LS2 provides the best efficiency with 0(n^log n) followed by the 
modified matrix matching algorithm MMA2 with complexity 0{n"^). 

3.3 Robustness of algorithms for images with different 
resolution and noise level 

One of the major conflation challenges is matching features from images 
with significantly different resolution. Feature extraction artifacts can also 
contribute to feature disparity. As we have seen in Figure 3, such features 
have different levels of smoothness and detail. In Figure 10 we show the be- 
havior of the SA algorithm in the presence of random noise that simulates 
effect of difference in resolution. Binary sequential division (BSD) was ap- 
plied two times and produced a polyline shown in Figure 10. 

To build stmctural matrices we compute all Si/Li ratios for the low reso- 
lution line (straight line) and the high resolution line with spikes. Here Li is 
the length of the /-th polyline segment along the curve and Si is the length 
of the straight line between end points of the /-th polyline segment. These 
ratios are: 1, 1, 1, 1 for the straight line and, say, 0.4, 0.45, 0.48, and 0.51 for 
the curve. Next, we check that all these ratios are close enough in each fea- 
ture separately, that is |Si - Si | < b for s = 0.15. All pairs of ratios are in 
these limits in both features. Thus, the SA algorithm matches given features 
on this BSD level. Significant differences in ratio values between two fea- 
tures are indicative that on a deeper level of BSD these differences will show 
up and features will not match at higher levels. 

Similarly, we can compute three sequential angles (170°, 165°, and 175°) 
for the high resolution feature. These angles are all 180° for the straight line. 
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All angle differences are less than the 15° limit for both images. Thus, they 
are matched by the AA algorithm on this BSD level. But the AA algorithm 
does not give any clue that this match may deteriorate in the next levels, in 
contrast with the SA algorithm that provides such clues. 




Figure 10. This example indicates that the SA algorithm based on lengths is more robust for 
similar cases than the AA algorithms based on angles 



4. CONFLATION MEASURES 

Characterizing and evaluating the quality of conflated data relative to a 
given problem is one of the most important open problems in geospatial data 
integration. A major part of this development is producing multidimensional 
measures of the correctness of conflation, since no single measure of quality 
of conflated data sets is appropriate for every conflation problem. 

Traditionally registration, co-registration, and conflation have been done 
with images of a similar type [Brown, 1992]. Some processes have used 
every pixel or every vector in each image. For these processes, the feature 
space is essentially the image (refer to Figure 4) and the similarity metrics 
chosen to guide the process and the transformations allowed between the 
images have an important bearing on the outcome as does the presence of 
local minimums in the search space and the search strategy. 

Other conflations use feature spaces that have been derived from the im- 
ages using a variety of techniques including the raw intensities, extractable 
edges, salient and statistical features, matches against models, and so on 
[Zitova & Flusser, 2003]. There are numerous possible choices for similarity 
metrics used for both images and features including, cross-correlation with 
and without pre-filtering, correlation coefficients, phase-correlations, sums 
of absolute differences of intensity, sums of absolute differences of contours. 
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other contour/surface differences, number of sign changes in point-wise in- 
tensity differences, sums of squares of differences between nearest points, 
minimum changes in entropy, etc. 

Conflation performance is defined by the quality and time of conflation. 
We are measuring the quality of conflation by a certainty that features are 
matched correctly. The speed of conflation is measured by the time required 
to conflate images. The quality of conflation is a multidimensional character- 
istic. In this paper, we focus on how close structurally features are in the two 
images. For each pair of features, it is measured by the BSD level parameter 
k. We assume that k > 5 indicates a high quality of conflation match. The 
final match is done by checking that there are no other matching features 
nearby with the same or higher level of match. In this is the case, we con- 
tinue BSD process with both AA and SA algorithms and all highly matched 
features to find a pair with the highest match. 

Error estimates. We need to know how good the conflation result is. A 
standard approach is to measure the difference between control points after a 
matching transformation is completed. In the base case, the distance is 0 for 
all control points after transformation. Does it tell us how good the confla- 
tion is or how good is the match for other points? Say, we have 1 0 control 
points in an image of 1000x1000 pixels, that is 10 points out of 10® points. 
This means that we evaluated the quality of conflation using 0.001% of the 
total number of points. The question is how we can evaluate the quality for 
the other 99.999% of the points without actually having 10® control points. 
The classical smooth interpolation approach does not answer this question 
explicitly. The only statement that can be made is that smooth continuous 
interpolations such as linear, polynomial and splines transfer close points to 
close points. Flowever, how close is not clear. An expectation is that the er- 
ror for other points should be similar to the errors computed for control 
points. If we have zero error for 0.001% of the total number of points, we 
can not claim that we will have zero error for the rest of the image too. Thus, 
an innovative approach is needed to measure quality of conflations. 

Matching non-control points. A transformation T based on control 
points will also transform all other image points, T(x)=y. Flowever, it is not 
done specifically for them. It is a byproduct of matching control points. Why 
should point v be matched to point y if v is not a control point? There is a 
huge number of transformations T’ that match control points, but differ from 
T in other points. Transformation T’ can convert point v to points y’ that can 
be close to y but not identical to y. Our conflation method identifies matched 
non-control points in a more meaningful way using a combination of meas- 
ures provided by AA and SA algorithms. This is important for enhancing a 
standard control point matching approach. 
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Virtual imagery expert. In addition, we are automatically capturing 
some human abilities to match images using context. In Figure 3 we may 
assume that an expert knows that the left image is a low-resolution image 
and the right image is a high-resolution image. Such context knowledge 
permits the expert to match these features despite their apparent differences. 
The combined AA and SA algorithm can help to discover this from the im- 
age itself by analyzing shoulder relations. 

Three similar methods have been developed for processes similar to ours. 
The first is a technique of matching planar polylines described by Cohen and 
Guibas [Cohen & Guibas, 1997]. We analyzed their task, called the polyline 
shape search problem or PSSP for short, to clarify the applicability of their 
approach to the algebraic structural conflation (ASC) task formalized here 
and to compare the differences in task formalizations and interpretations. 
Our conclusions are as follows: 

(1) Both tasks PSSP and ASC assume that matching may require rota- 
tion, scaling and translation of polylines 

(2) The task of PSSP is to find a part of a polyline A that matches a 
polyline B. Polyline B can be a relatively small “template” polyline 
(e.g., corner-shape or step-shape). See Figure 1 1 for examples. 

(3) The task of ASC is to find the largest matched part C of two larger 
polylines A and B. 

(4) PSSP uses two criteria forjudging the quality of a match; 

a. the error of the match computed as a distance between polyline 
A and the matched part of polyline B after B is transformed by 
shifting, rotating and scaling; 

b. the length of the match is computed as a number of matched 
breakpoints on the polylines multiplied by a (stretching pa- 
rameter). This parameter is used to transform B to A. 




Our ASC method is more general than the one developed by Cohen and 
Guibas because (1) two arbitrary polylines can be searched for similar por- 
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tions in each and (2) there are no point limitations on the determination of 
distances (see the Appendix of this report.) 

Another measure of feature shape similarity was defined by Cobb et al. 
[Cobb, Chung, Foley, Petry, Shaw, & Miller 1998]. The idea of the method 
is illustrated in the following Figure 12(a) and 12(b). 



The measure of Cobb et al. is based on the following steps: 

1. Bring the features to a standard position with the start point 
(x,y)=(0,0) and the end point (x,y)= (1,0) by rotating and rescaling 
the features. In this way, the first feature is presented as a function 
P(r), and feature 2 is presented as a function Q(r). 

2. Merge nodes — make the number of nodes and their locations on the 
X coordinate equal in both features by mapping all nodes 
(po,pi,...,Pn) from feature 1 onto the feature 2, and by mapping all 
nodes (qo,qi,...,qm) from feature 2 onto feature 1. The merged num- 
ber of nodes is « + if there are no overlapping nodes in features. 

3. Compute the Frechet measure of distance L 2 between features as a 
square root of the integral of squared differences between features 
using merged nodes: 



4. Compute the normalized distance between features, L 2 (DJ+D 2 ), 
where D /and D 2 are lengths of the features in the standard position. 

This measure is applicable for the following situations: 

1 . Features can be arbitrarily rotated and have different sizes. 

2. Ratios of scales for x and y are similar for both features. Figure 12(a) 
shows this case and Figure 13(b) shows the significant difference in 
ratios in two images indicated by dotted rectangles. In Figure 13(a) 
Feature 1 is taken from the image where the ratio of width and height 
of the feature bounding box is 0.5 and feature 2 is taken from the im- 
age 2 where the ratio is 0.6. 

3. Some structural differences are not important. Figure 12(a) shows 
structural differences that are not captured by the measure; gray lines 




(a) 

Figure 12. Structural differences 
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in both Figure 12(a) and Figure 12(b) may give the same similarity 
with the black line. 

4. Each feature is in the bounding box defined by the start and end 
points (see Figure 12(a). Note this method is not applicable for fea- 
tures shown in Figure 12(b).) 




Figure 13. Examples of scale ratios 





Figure 14. Bounding boxes for standardized features 

Our ASC method descried earlier is applicable for both conflation cases 
presented in Figure 14. It captures the structural differences between two 
lines shown in Figure 12(b) using angles. 

While the method of Cobb is satisfactory for similar features with similar 
lengths, it lacks the ability to determine partial matches of polylines. By 
using the Frechet measure of distance L 2 between features it cannot accu- 
rately compare structures that are very different and will not work at all with 
others such as illustrated in Figure 14(b). 

Carswell et al. [Carswell, Wilson & Bertolotto, 2002] have developed a 
composite similarity metric to aid in the location of “image-objects” in im- 
ages. They combine weighted topology, orientation, and relative-distance 
similarity measures for the image components Q and compare it with an im- 
age scene 1. For our earlier example of finding a house with an outdoor 
swimming pool across a road from a barn and silo, the “image-object” would 
consist of these five elements with appropriate topology, orientations, and 
distances defined. Their similarity measure provides a percentage match 
between the sum of these features Q and similar ones in the image /. 

The individual similarity measure weights can be varied to emphasize the 
relative importance of different image properties. For this example, one 
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could weight the topological relationships strongly, the relative-distance 
measure moderately and the orientation similarity metric the least. 

This approach should give similar results to one that applies algebraic in- 
variants by combining individual features into composite features and then 
proceeding with the conflation/co-registration process in a normal manner 
with loosened distance matching constraints. 



5. GENERALIZATION: IMAGE STRUCTURAL 
SIMILARITY 

It is critical to measure the quality of the whole registration/conflation 
process. We described the issue of matching individual features from two 
images using the concept of stmctural equivalence of individual features. 
Now this concept is used as a base to define structural similarity of two im- 
ages. No single numeric measure is sufficient to cover the complex variabil- 
ity of images and matching contexts/situations. Thus, we introduce a set of 
measures that is a more adequate measure of the quality of registra- 
tion/conflation. 

We start with the simple situation with one-to-one feature matches. 
Every feature in image 1 has a single match in image 2 and the same is true 
for the image 2. Let n=(ni,n 2 , ...,n„J be a vector of structural similarities de- 
fined above for all m pairs of matched features. This vector provides a 
measure of the structural similarity of two images. We can compute 
max(ni,n 2 ,...,n„), min=(ni,n 2 ,...,n„), average(ni,n 2 , ...,n,„) and the variety of 
moments of the distribution of (nj,n 2 ,...,n„J and use them as similarity indi- 
cators if m is a large number. The difference max(ni,n 2 ,...,n„) - 
min(ni,n 2 ,...,n,„) can be used as an indicator of variability in structural simi- 
larity. These indicators can also serve as change indicators that are important 
for image change detection tasks. High values of elements of vector n mean 
that there is no change in the image 1 during the period between taking im- 
age 1 and image 2. 

Now we can consider the situation with subsets of features. Every feature 
in image 1 has a single match in image 2, but some features in image 2 have 
no matched feature in image 1 . This situation is possible when there is a new 
development in the area. Again n=(ni,n 2 , ...,nm) is a vector of structural simi- 
larities for all m pairs of matched features, but these pairs do not contain all 
the features in image 2. Similarly we can compute max(nj,n 2 , ...,nm), 
min=(nt,n 2 ,...,nm), average(ni,n 2 ,...,nm) and other characteristics listed 
above. However, now even if we have high values of elements of vector n 
we cannot say that there is no change during the period between taking im- 
age 1 and image 2. Let mi and m 2 be the number of features in Image 1 and 
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Image 2, respectively. Then there is a vector n with nii elements. A new in- 
dicator, mi/m 2 shows the ratio of common features in two images that can be 
added. Similarly, sets of measures for more complex situations can be intro- 
duced. 

The structural similarity definitions can be augmented with definitions of 
geometric similarity, based on metric distances. We may need to transform 
one image to the other one before computing geometric similarity. The trans- 
formation can be based on features that have been matched. 

Below we summarize a version of our shoulder-based method to measure 
a structural difference in curvilinear features in the following way: 

Step 1 : Compute L, the length of the line from its start to the middle point D 
along the line. 

Step 2: Compute the lengths of two “shoulders”, SI and S2, that is SI the 
length of the straight line [T, D] between the start point T and the mid- 
dle point D. 

Step 3. Compute ratios R1=S1/L1 and R2=S2/L2, where L1=L2 I is the half 
feature length computed along the feature. If a ratio R1 is close to 1 
then the first part of the feature is close to the straight line, similar with 
R2. Find ratios that are in the threshold limit. 

Step 4. Repeat steps 1-3 for subfeatures [T, D] and [D, E] recurrently for 
their subfeatures until all ratios are equal to 1 (straight line). 

Theorem. For any polyline with n nodes, step 4 must be repeated no 
more than n log n times to get all ratios equal to 1 . 

Proof. This can be shown by considering an extreme case, where after 
the finding middle point D, only the single end points will be in the right 
“shoulder.” This means that the n-1 nodes are in the part between the start 
node T and M including node pn-i. We know that the polyline between node 
Pn-i and the end node E = p„ is a straight line [pn-i, E]. Above we assumed 
that M is between them, thus [D, E] is a straight line and a part of the longer 
straight line [p„.i, E]. That is the ratio R 2 is equal to 1 for [D, E] and Step 4 
took only one iteration for the subfeature between D and E. 

Now we assume that the same extreme case is true for the left subfeature 
from point T to point D and subsequently for all its subfeaures. That is we 
need to repeat step 4 only n times for this extreme case. If our previous as- 
sumption is not true and we have more than one node in the subfeature be- 
tween D and E, we can repeat such binary search process at most for log n 
times to come to the situation with a single straight line. Having total n 
nodes it may take n log n loops with Step 4 in the worst case. 
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6. CONCLUSION 

This chapter describes a technique of image correlation that does not rely 
on geometrical or topological invariants or on identifying points in common 
with known coordinates. This technique determines relative scales and orien- 
tations and corresponding points by analyzing linear features identified in 
each image and fit with a polyline. While the example presented is from a 
research area at hand, there is nothing intrinsic to this method that ties it to 
the spatial imaging of the earth. It could as easily be applied to any set of 
overlapping images from any discipline and to images produced of a dy- 
namic scene at different times. 
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8. EXERCISES AND PROBLEMS 

1. Draw a road network with four nodes (road intersections) and 6 roads. 
Define an algebraic system that would represent this road network. 

Tip: A graph representation is a special case of an algebraic system. Rep- 
resent a road network as a graph described by the predicate P(x,y,r) that 
is true if two network nodes x and y are connected by road r. Construct 
predicate P for your road network. 

2. Develop an algebraic representation of your six roads from exercise 1 by 
building matrixes similar to matrix shown in Table for angles between 
road segments. 

3. Combine algebraic representations from exercises 1 and 2 and draw an- 
other road network that would satisfy the combined algebraic representa- 
tion. Analyze the differences between two road networks. For instance, 
could they be different representations of the same road network but pro- 
duced using different sensors in different locations? 
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ALGORITHM DEVELOPMENT TECHNOLOGY 
FOR CONFLATION AND AREA-BASED 
CONFLATION ALGORITHM 



Michael Kovalerchuk and Boris Kovalerchuk 
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Abstract: This chapter presents a technology of conflation algorithm development with a 

wide applicability domain. The sequence of steps starts from vague but rele- 
vant expert (or just human) concepts and going toward an implemented confla- 
tion algorithm. The generic steps are illustrated with examples of specific steps 
from the development history of an area-based “shape size ratio” conflation 
algorithm. The fundamental “shape size ratio” measure underlying the algo- 
rithm has rather strong invariance property, including invariance to dispropor- 
tional scaling. The implemented algorithm has been integrated into the 
ArcMap GIS Software as a Plug-in. 



Key words: Algorithm development technology, imagery registration, conflation, geospa- 

tial feature, invariants, disproportional scaling, structural similarity, area ratio 
method, ArcMap Plug-in. 



1. INTRODUCTION 

The overall goal of this chapter is to present an algorithm development 
teehnology for eonflation (ADTC). The concept of conflation was identi- 
fied in [ Cobb et al, 1998; Jensen, Saalfeld et al, 2000; Doytsher et 
ah, 2001; Edwards, Simpson, 2002] and in Chapters 17-19. This task is an 
expansion of the imagery registration task [Brown, 1992, Zitova, Flusser, 
2003, Shah, Kumar, 2003; Terzopoulos et al, 2003; Wang at al., 2001;]. The 
technology is described in terms of its desirable characteristics and the actual 
scope of applicability of individual algorithms. In this approach, a category 
of images that an algorithm can work with should be identified and a formal 
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description of images in the category should be provided. The concept of 
monotonicity is used as a tool to help analyze and identify the scope of the 
algorithm coverage. 

The motivation to develop an ADTC beyond the development of an in- 
dividual algorithm follows from three observations: 

1) Conflation is an ill-posed problem from a mathematical viewpoint. 
Thus, there is no chance to develop a single “magic” formal algo- 
rithm that will solve this ill-posed problem for all possible pairs of 
images to be conflated. 

2) There is a chance to mimic a reasonable human practice in manual 
image conflation for specific categories of images as an automatic 
computer algorithm. 

3) Human practice and expertise is fragmented, an individual imagery 
analysis may not work with some specific categories of images and 
may not provide any input for formal algorithm development. 

These observations explain why a technology for algorithm development 
is an attractive alternative to the unrealistic “magic” algorithm. Having said 
this we do not want to fall into another trap, i.e., to end up with thousands of 
“mimic” algorithms with very narrow scopes. This leads us to the require- 
ment that a “mimic” algorithm should be rather invariant to rotation, trans- 
lation and disproportional scaling, and able to work with data of different 
types and modalities (multispectral, SAR and others), resolutions and noise 
level. 

To build a technology we record and analyze a real sequence of algo- 
rithm development. This sequence starts from a very vague idea and ends up 
with a formal algorithm implemented. Below in Table 1 we provide a free 
recording of a real sequence of algorithm development. 

Table 1. Algorithm development recording 

A human expert asks himself the question: “What does it mean for two images to be of 
the same thing?” 

I remember my manual conflation process of two images with two lakes. There is a 
smaller one next to the big one in one image. Look for the big lake in the second image. Aha, 
there is a smaller lake next to the big one on the second image too. The two lake areas must be 
the same. 

I need to develop a formal method for calculation and comparison of lake size ratios from 
this general observation. I recall that in my previous project a simple floodfill-like single 
color algorithm has been developed and used. It breaks up the image into shapes and calcu- 
lates pixel counts. Maybe I can use it here too. In the floodfill-like algorithm, I set the first 
pixel as located in the upper left comer. The next unmarked pixel of the same color is joined 
to the flood area and the process continues until all adjacent pixels are tested. This process 
sets a flood area as the first shape. After recording all shapes I can compare their sizes in the 
two images by setting formal comparison relations between them as size ratios. 



In essence this is a top to bottom approach where finally the basic pixel 
count is taken as a measure of lake size to be compared. Thus, Table 1 pro- 
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vides some information, but it is not systematic. Why should we focus on 
size ratio not on other parameters? Having only such random records we will 
be limited in the future to produce a general technology. 

Therefore, our first task is finding a systematic way to record an analyst’s 
experience and observations. The systematic way we propose consists of the 
several steps described below. The first one is that an expert writes a list of 
features he believes are relevant and free statements about them such as pre- 
sented in Table 2. 

Table 2. Feature list recording 

Other features can be: (i) three or more “lakes” are on the each image, (ii) lakes of differ- 
ent sizes, (iii) unique features (iv) linear closed contours and so on. 

Relevance of the features can be measured, for instance, by abilities to build an affine 
transformation of one image to another based on these features. 



Going through various ways of formalizing asymmetric features the ex- 
pert records his/her experience (see Table 3). 

Table 3. Feature formalization and algorithm implementation recording 

Asymmetric features can be measured by the number of pixels to the top and bottom of 
center of the shape or the left and right of center. Coded a rather simple algorithm for calcu- 
lating this “asymmetry percentage”. While coding and debugging the function CalcShape- 
TopBottomAssymmetryPercent() which is based on partial pixel counts (shape sizes), came to 
the idea of using a ratio of shape sizes as a unique feature, remembering the manual experi- 
ence of a lake conflation case. To match shapes, decided to collect data on all shapes in the 
image, and calculate all shape sizes (pixel counts) and a matrix of their ratios. For each row of 
ratios, trying to find the matching row of ratios in the second image, came to the idea and 
calculation of matching scores. If eight ratios in the row of ratios of shape 1 in image 1 are the 
same (within a small threshold for comparing doubles) as the eight ratios in the row for shape 
3 of image 2, then the match score of shape 1 to shape 3 is 8. Then remembering that 3 points 
are enough for linear conflation, the 3 shapes with highest match scores were picked for the 
transform calculation. The center points of the shapes already calculated for the asymmetry 
algorithm were used as the 3 points for finding (calculating) the affine transform. 



These observations give an impulse to more abstract thinking that in- 
cludes getting 14 parameters that intuitively seem relevant (unique features, 
etc). These 14 parameters (after some refinement discussed below in section 
3.2) are presented in Figure 2. Then the expert develops relatively easy to 
calculate pixel-based image measures such as asymmetry percentage, aspect 
ratio and others. These measures are intended to be indicators of parameter 
values. Next the expert assesses the application area (image types) for the 
suggested measure and concludes that it is sizable enough to pursue the 
method further. 
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2. STEPS OF THE ALGORITHM DEVELOPMENT 
TECHNOLOGY 



The technology contains several steps described in this section in general 
terms and some steps are detailed in further sections with examples. 

At the first step an imagery analyst generates a set of binary parame- 
ters {P} potentially valuable for conflation. Each parameter has only 
true/false values, but at this stage no formal procedure set up to assign these 
values (it can be a vague expert opinion). 

These parameters are very preliminary and vague. Parameters {P} can 
be present in both images p and P or be comparative for two images. For 
instance, Parameter P/ can state that a “unique feature” exists in image p. 
Thus, Pi can be a binary indicator with two value 0 (false) or l(true). Using 
the predicate notation this parameter can be written as P 2 (P )=1 if a unique 
feature exists. The concept of “unique feature” P 2 is not formalized at this 
stage. 

Parameter P 2 can indicate that scales g(p) and g(P) of two images p and 
P differ by no more than 2 times and can be written in a predicate notation as 

P2(P,P)=l«g(P)/g(P)<2. 

The concept of scale is formalized at this stage. 

Another informal intention in generating parameters is to get parameters 
{P} that would be invariant to rotation, translation and disproportional scal- 
ing to be able to generate a robust conflation method. It is not a parameter 
selection criterion at this stage, because often a parameter can be made in- 
variant after some corrections. 

The second step is to evaluate preliminary sufficiency of the set of pa- 
rameters and if it is sufficient, attempt to find subsets of parameters that can 
be sufficient. This informal step intends (1) to narrow the set of parameters 
for further work if there are too many parameters or (2) to expand the set of 
parameters if it seems insufficient to avoid inconsistent and unreliable con- 
flation solutions. A set {P,j} of the potentially sufficient parameters selected 
from all parameters {P,} identified in step 1 is a result of this step. 

The third step is to select some promising subset of parameters from 
the set of the potentially sufficient parameters {Pis} defined on the step 2. 

This step is still informal. It is motivated by the intent to review parame- 
ters {P,j} from a variety of viewpoints. One of them is the aheady mentioned 
invariance of parameters to translation, rotation and scaling. Others could be 
the ability to formalize, to measure and to compute each parameter P,. The 
result of this step is to select a set {P,-. 5 p} of promising parameters from all 
potentially sufficient parameters {P,.,} identified in step 2. 
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The fourth step is to attempt to formalize selected parameters from 
scratch or by using an existing library of formalized components. For in- 
stance, “unique feature” parameter was formalized based as an area ratio 
using an already developed pixel count measure. 

This is a critical step of transforming the whole conflation process from a 
pure expert realm to more automated area with the long-term intent to pro- 
duce a completely automatic conflation algorithm. The creative nature of this 
step and its unpredictability are well-known. Steps 1-3 are intended to in- 
crease chances for success in this step. 

The result of this step is a set {Pisp/} of the formalized parameters se- 
lected from all promising sufficient parameters {Pisp} identified in step 3. 

The fifth step is to analyze formalized parameters {Pispf} for invariance. 
This step permits mathematical analysis of the parameter and computational 
experiments with computing each parameter P,- for differently modified im- 
ages /r/(7;, p 2 (I), yfl): 

PiflfI)),Pifl2l)),..., PfPnd)) 

and checking that these values are the same. 

Obviously mathematical analysis can be more difficult conceptually, 
because there is no procedure like the one described above for computational 
experiments. Flowever, mathematical analysis can provide stronger invari- 
ance conclusions. Computational experiments can only confirm invariance 
for the transformations {fi} tested, but the mathematical proof can cover all 
possible transformations. As we show below this was the case with the 
mathematically proven statement that the parameter “area ratio” is an invari- 
ant. 

The sixth step is to develop a conflation algorithm based on formalized 
invariant parameters. Even though the parameters are formalized, this step is 
still not formal. For some parameters it can be straightforward, but for some 
parameters it can be a very creative process. The “area ratio” invariant pa- 
rameter is an example of the relatively straightforward algorithm derivation. 

The seventh step is to develop an algorithm for finding invariant pa- 
rameters in images. 

The eighth step is to determine algorithm domain limits - determine the 
types of images on which the invariant algorithm from step 6 conflates well. 
If the domain is sizable or significant, we have a valuable conflation algo- 
rithm for the domain. If the domain is small and insignificant, then go back 
to previous steps and retrace the path hoping to get a more universal method. 

The flowchart in Figure 1 shows the general sequence of steps for the al- 
gorithm development methodology. These steps can be looped to get an im- 
proved result. 
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Figure 1. Overall steps of the ADTC technology 



3. PARAMETER IDENTIFICATION STEPS 



3.1 Generation of parameter set 

The identification of parameters is the goal of the first three steps of 
ADTC. The Imagery Virtual Expert System (IVES) described briefly in 
chapter 21 includes several tools including a parameter editor that permits 
recording, editing, and storing parameters. Figure 2 shows the list of 14 pa- 
rameters suggested for further analysis and recorded in IVES. 

These parameters are influenced by the free recording of an expert’s al- 
gorithm considerations for an algorithm development. In Figure 2 all pa- 
rameters are binary with only true/false values. 

The parameter “Simple unique geometric feature exists” resembles pa- 
rameter Pi discussed in section 2 on existence of a “unique feature” in the 
image. Parameter “Image scales differ no more than two times” is the same 
as parameter P 2 also described in section 2. 
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Case 1 Manual Rule Generator | Optimized Rule Generator 


ameter Editor | 


ISimple dominant aeometric feature exists 


ISimple unique qeometric feature exists 


lAsvmmetric unique features exist 


|No asymmetric features 


|lma«e scales differ by no more than 2 times 


Savef 


llmaqe sizes differ by less than 3 times. 


jlmaqes are not small, roundinq errors less impact 


llmaqes have similar complexity level 


iLarqe area of the pictures matches, larqe overlap. 



iLonqest Line match produces matches for maior lines 



[After maior lines match, all lines have neiqhborlnq match 


Load F 


[After maior lines match, maior straiqht lines do not cross 


[After maior lines match, maior lines have no neiqhborinq match 


[One and only one linear mappinq matches imaqes - Tarqet 









Figure 2. Preliminary parameters for conflation algorithm 
development recorded in IVES system 



3.2 Parameter set minimization 

The goal of this step is the further polishing of the set of parameters. At 
first we want to preliminary evaluate if the set of 14 parameters could be 
sufficient to solve the conflation problem with a linear (affine) transform. At 
this informal stage, the answer for this question is simply an expert opinion 
(yes/no). If the answer is “yes” then we are interested in narrowing this set of 
14 parameters to a smaller subset that can be sufficient too. This is also done 
by asking an expert about subsets {Pis} of the potentially sufficient parame- 
ters selected from all parameters {P/}. The simple exhaustive option here is 
to ask an expert to answer yes/no about all 2 '"^=4096 subsets. Obviously such 
approach is not realistic and moreover redundant. The monotone Boolean 
function approach [Kovalerchuk et ah, 1996, 2001] is more appropriate here. 

The approach has two significant components. The first stage is to for- 
mulate each parameter in such a way that if the expert answered ‘yes” about 
this parameter then it increases the chances that the conflation can be done 
using this parameter. For instance, the parameter “Simple symmetric feature 
exists” may or may not indicate increased chances that conflation can be 
successful. With many similar symmetric features it can be very difficult to 
make a unique match. This parameter can be negated and reformulated as 
“Asymmetric unique feature exists”. If the expert answers “yes” for this re- 
formulated question then the chances to find a unique conflation matching 
points and features increase, because we may find unique matching points 
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and other feature parts that can be uniquely matched with similar elements 
in another image. In Figure 2 parameters are already reformulated in such a 
way from their original recording. We will call such reformulation a positive 
monotone reformulation. Thus, the first test for 14 parameters is to ask the 
expert if he/she agrees that if all 14 parameters are evaluated with “yes” then 
there are very good chances that two images can be conflated by an affine or 
more complex transform. 

If the answer is “yes” then an attempt to minimize the set of parameters is 
made. This is the second stage of the approach. Figure 3 shows how IVES 
system supports this stage. The expert is asked to answer if checked parame- 
ters received “yes” answers could be it sufficient for successful conflation. If 
the answer is positive then this subset of parameters will be a new starting 
point for further decreasing the set of parameters. 



Case ] Manual Rule Generator | Optimized Rule Generator | Parameter Editor | 



Simple dominant geometric feature exists W 

Simple unique geometric feature exists W 

Asymmetric unique features exist I” 

No asymmetric features V 

Image scales differ by no more than 2 times W 

Image sizes differ by less than 3 times. W 

Images are not small, rounding errors less impact W 

Images have similar complexity level I” 

Large area of the pictures matches, large overlap. I” 

Longest Line match produces matches for major lines 
After major lines match, all lines have neighboring match I” 

After major lines match, major straight lines do not cross W 



After major lines match, major lines have no neighboring match W 

One and only one linear mapping matches images... 

Figure 3. Exploring parameter subsets. 

There is no reason to ask the expert about subsets that include all parame- 
ters of this subset and some more parameters. That large subset should be 
sufficient too by the parameter design performed at the first stage described 
above. This is the property of monotonicity of Boolean functions that we 
exploit. Thus we are only interested in cutting down the number of parame- 
ters. 

A screenshot in Figure 4 illustrates how IVES system supports this stage. 
The expert determines if the following checked parameters could be suffi- 
cient to have as single linear mapping from one image to another. The expert 
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answers using yes/no buttons for the current subset of parameters shown in 
the first column. Previous answers are shown in other columns. The light 
color (green) indicates positive answers (yes) and the dark color (red) indi- 
cates negative answers (no). The sequence of questions is stored in the IVES 
system in advance. This sequence can be altered by loading another se- 
quence file. Expert’s answers are recorded in another file. The question se- 
quence can be optimized by selecting an appropriate sequence file. If noth- 
ing is known about potential expert’s answers in advance the sequence that 
will ask the minimal number of questions for the most difficult situation can 
be used. This best sequence for the worst-case scenario is formalized by the 
Shannon criterion and is based on Hansel chains [Kovalerchuk et al., 1996]. 
See also Chapter 16 for more detail. 



Rule Generator \ Feature Editor Answer Generator 



Simple dominant geometric feature exists 

Simple unique geometric feature exists 

Asymmetric unique features exist 

No asymmetric features 

Image scales differ by no more than 2 times 

Image sizes differ by less than 3 times. 

Images are not smalt, rounding errors impact less 
Images have similar complexity level 
Large area of the pictures matches.large overlap. 
Longest line match produces matchesfor main lines 
After main lines match, all lines have neighbor match 
After main lines match, straight lines do not cross 
After main lines match, others have no neighbor match 



One and only one linear mapping matches images 




Current Question 2 
Current Question 3 
Question 15 eliminated 
Current Question 4 
Question 25 eliminated 






Figure 4. Expert questioning using monotonicity principle. See also color plates. 

The most desirable output from this step would be the conclusion that a 
single parameter out of 14 listed could be sufficient for some sets of images. 
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3.3 Single parameter selection. 

The analysis of expert’s answers had shown that the parameter “Asym- 
metric unique features exist” popped up as a single parameter that could be 
sufficient for building the algorithm. The motivation for this selection is as 
follows. If asymmetric features exist and are preferably unique then ambigu- 
ity in conflation is less likely. Symmetric features are more likely to cause 
ambiguity. In the next section we analyze how this parameter can be formal- 
ized. We interpret the concept of asymmetric unique feature very generally. 
Such a feature could be a cluster of three square buildings of different sizes 
and asymmetrically located to each other. The buildings themselves are not 
unique, but their mutual location can be unique. Thus we do not require that 
the feature be a single continuous entity. 



4. ATTEMPT TO FORMALIZE PARAMETERS 

In this section, we discuss step 4 by analyzing parameter P 3 , “Asymmet- 
ric unique features exist” selected in the previous section. The first formal- 
ization assumes that unique features are represented by shapes and unique- 
ness of shapes is measured by an aspect ratio. This aspect ratio is computed 
as h/w, where h is the height and w is the width of the rectangular bounding 
box (BB) around the shape. For now we can assume that the sides of BB are 
parallel to X,Y coordinates. We use notation PsCI) = true or simply Pi{I) for 
image 7 if / contains a shape F with aspect ratio h/w such that there is no 
other shape Fi with aspect ratio h/wi within 10 % of the aspect ratio h/w in 
the same image A: 

P^{I) <f> VFi |hi/wi -h/w| > O.lh/w 

Such shape F is considered to be unique in image A, but may not be unique 
in image B that is the same as image A, but transformed somehow. This is 
the subject of investigation on further steps of the process. 

The second formalization for P^ is based on the concept of area occupied 
by the shape in the bounding box. In this formalization, Pt,{I) =tme if less 
than 60% of the shape’ bounding box is occupied by the shape F and its in- 
ternal shapes is asymmetric. 

P 3 ( 7 ) 0 > S < 0.6 hw & Asymmetric(F), 

where S is the area occupied by the shape. The concept of asymmetric fea- 
ture should be formalized later as a predicate Asymmetric(F). The simplest 
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formalization would be just to assume that the already written formalization 
is acceptable that the shape (with internals) that occupies less than 60% of its 
smallest bounding box is asymmetric. 

It is important to notice that we do not view such formalization as folly 
formally equivalent to the intuitive concept of asymmetric unique features. 
Rather, they are partial measures that capture only some aspects of the in- 
formal parameter definition. 

This parameter can be further specified by requiring that n pixels above 
the center is 30% greater than the number of pixels below the center and no 
other shape in same image has a similar asymmetry percentage within a 5% 
threshold. Example: there is a shape with 40% and there is no other shape 
between 35% and 45% in the picture. The same characteristic for testing left 
and right asymmetry can be computed. Any direction can be selected for 
testing asymmetry. 

For unambiguous conflation we may require that all shapes (with inter- 
nals) occupy less than 60 % of their smallest bounding box. A unique aspect 
ratio can be a better predictor of asymmetry than areas. Area occupied in the 
bounding box can be an indicator of convexity and concavity of the shape. 
An area that is less than a rhombus should be concave somewhere. For con- 
flating images unique features in both images should have similar concavity 
ratios. If a ratio is close to 0 then the shape is very concave and if it is close 
to 1 then the shape is very convex. Note some regions can have a ratio far 
away from 1, but be very convex, e.g., symmetric n-gons. Flighly different 
concavity ratios are indicators against a match. 

The algorithmic procedure for all these formalizations is the same — 
for every shape found in the image search the array of shapes with ratios 
within 5% of current shape. If no such shapes are found, the current shape is 
asymmetric and unique, ^ 3 = 1 . 

In general the attempt to formalize a selected parameter (termed as step 4 
of ADTC) can be viewed as consisting of two substeps: 

Step 4.1. Find a local or global image characteristic of the image (called 
an impact measure) that may have an true/false impact parameter. As men- 
tioned above “Ratio of pixels to the top and bottom of center of the shape 
(termed “Central Asymmetry Percentage”) is an example of an impact 
measure. The search for impact measures can be performed from scratch or 
starting from existing impact measures or their components. 

The ADTC technology for algorithm development assumes that a system 
built according to this technology records impact measures in the course of 
developing conflation algorithms. Then recorded measures are used for de- 
veloping new impact measures. When the number and diversity of recorded 
measures will be large enough the system could be able to “learn” new im- 
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pact measures automatically. The learning is done by modifying and general- 
izing recorder measures using case-based reasoning approach. Such learning 
can make development of measures needed for new algorithms easier. This 
is an essence of the step 4.2. 

Step 4.2. Spread the “Impact Measure” — modify and generalize the im- 
pact measure from step 4.1 to get several similar easy to implement impact 
measures. For instance, having an impact measure m = “Shape size as pixel- 
count” new impact measures “Ratio of sizes of two shapes,” and “Asymmet- 
ric shape” can be developed. For instance these measures can test that 
m < 0.6B, where B is a size of the rectangular bounding box for the shape. 
Similarly, the measure “Elongated Asymmetric feature” can be developed 
that is defined as a shape aspect ratio greater than 3. 



Step 5 of ADTC is called Check Invariance for short. All suggested im- 
pact measures are analyzed for invariance to affine transformations. For in- 
stance, the pixel count based ratio of sizes of two shapes mentioned above in 
step 4 is invariant. See a formal analysis of this impact measure invariance 
with the invariance theorem below. 

Invariance to disproportional scaling (DPS) is one of the most difficult re- 
quirements to meet. In Chapter 19 relations between angles and linear seg- 
ment lengths have been exploited to build conflation algorithms. These rela- 
tions are relatively robust, that is they do not change for limited DPS. Figure 
5 shows a robust DPS situation with the angle relation “>” preserved. 



5. ANALYZE PARAMETER INVARIANCE 








B>A 



150 >50 



(a) 



Figure 5. Case of robust invariant angular relation 
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In Figure 5 (a) angle B is greater that angle A, B>A. This property is pre- 
served under disproportional scaling shown in Figure 5(b), where still 
B'>A'. 

Now we can explore robustness of relations “=” and “>” between angles 
and areas relative to other disproportional scaling. In Figure 6, area Si is 
equal to area S2, Si=S2. In addition, angles A, B, and C, D are equal, A=B, 
C=D too. 





Figure 6. Original image 

Figure 7 presents the same image after disproportional scaling, where X 
coordinate was multiplied by ky>\ and Y coordinate is not changed, Aj,=l. 
Relations between angles A and B have been changed, A'< B'. Also rela- 
tions between angles C and D are changed, C'<D'. In contrast, the relation 
between areas S'l and S'2 is not changed, S'i=S'2. 





Figure 7. Image after disproportional scaling 



This can be proved by noticing that bounding boxes Ui and U2 around 
rhombuses Si and S2 are not changed and each rhombus occupies a half of its 
bounding box. 

More formally we have U'i= U'2= KkyXJj, and using property 

Ui=U2, we conclude that kJefJi= k^kyUj and therefore U'i= U'2. Next, S'i= 
U'i/2 and S'2= U'2/2 hence S'i=S'2. 

This proof is also valid for the general case when kx^l. We can notice that 
rotation and translation do not change relations between angles as well as 
between areas. Thus, area relations are not changed under translation, rota- 
tion and disproportional scaling. 

Above we considered only simple relations “=” and “>” between areas. 
Other area functions can also be invariants. For instance, ratio of areas S1/S2 
is also invariant under disproportional scaling, because k fey's, ilkxkyS2 = S1/S2. 
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The consideration above can be converted into a formal theorem state- 
ment. Let F be an affine transformation that combines disproportional scal- 
ing transformation K with scaling coefficients ky), ^0, ky ^0, translation 
T and rotation R, F=K TR, where K,T and R are transformation matrixes. 

Let also Gi and G2 be two closed regions and Si=S(Gi) and S2=S(G2) be 
their areas respectively, where S( ) is an operator that computes area of the 
region Gi. 

Theorem: An affine transformation F of the image does not change the 
relation between area ratios, S1/S2 = S(Gi)/ S(G2)= S(F (Gi))/S(F (G2)), that 
is F is an isomorphism for area ratio S(Gi)/ S(G2). 

The proof follows from the considerations that preceded the theorem. 
The use of generic algebraic system formalism for describing images is con- 
venient because it permits the analysis of the isomorphism and homomorph- 
ism of images based on different image characteristics uniformly. 



6. CONFLATION ALGORITHM DEVELOPMENT 



6.1 Develop conflation algorithm using formalized invari- 
ant parameters 

In this section step 6 of ADTC is illustrated with the development of an 
Area Ratio Conflation Algorithm (ARC algorithm for short). The invari- 
ance of the area relations found in the previous section is the base for this 
algorithm. At first we describe the general concept of the algorithm and then 
present it in more detail and more formally using the generalized algebraic 
framework discussed in Chapter 19 for features presented as polylines. Thus 
in this chapter the algebraic framework is extended for shape-based features. 

Two raster images are conflated by finding at least 3 matching uniquely 
sized regions (areas) in both images and using the center points of those re- 
gions as reference points for an affine transform calculation. 

There are several possible formalizations of the concept of “matching 
uniquely sized regions (areas)”. In the ARC algorithm, we use the area ratio 
Si/Sj as the matching characteristic, because of its invariance shown in the 
previous section. If two areas in the original image have a ratio, say 0.3, then 
the same ratio between them should remain the same under any affine trans- 
form. Thus in the second image, we can compute areas of regions, their ra- 
tios Si/Sj and search for a 0.3 ratio among them. If only one such ratio was 
found then centers of these regions give us two tie (control) points for build- 
ing an affine transform. Finding a third region Sm in the both images with the 
equal ratios Si/S„, in both images provides the third tie point needed for an 
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affine transform. This basic idea is adjusted for the cases where more than 
one matching triple found. An additional uniqueness criterion is introduced 
in the algorithm based on the analysis of additional ratios. 

Suppose there is an image that contains a large lake of some size and a 
small lake whose size is 'A of the size of the large lake. This size ratio ('A) is 
invariant to affine transformations. The ratio precision needs to be adjusted 
to the scale of least precise image. Ratios 'A, I/2 and 14 could match 0.336 
0.52 0.27 if images are of different scales. The algorithm uses a matching 
threshold for these cases. 

This logic of the algorithm requires: (1) an algorithm for computing area 
ratios and for matching ratios and (2) an algorithm for region extraction from 
the image. The first algorithm called the Ratio Algorithm and the second 
algorithm called Vectorizer are described below. The development of the 
second algorithm is the goal of the Step 7 of the ADTC technology. 

The ratio algorithm starts from a set of regions {Gn} for image 1 and a 
set of regions {G2i} for image 2 extracted by the Vectorizer algorithm. The 
Ratio algorithm computes areas for each region in both images, Sii=S(Gii), 
S2i=S(G2i) as a number of pixels inside of the region. Next this algorithm 
computes two matrixes V\ and V2. Elements of matrix Ei = {cij} are Cij=Sii/Sij 
Elements of matrix ¥2= {qif are defined similarly, = S2i/S2j. We assume 
that all areas Sn and 821 are positive. 

The matrix representation is important because it permits us to convert 
the situation to a generic algebraic system framework, with algebraic sys- 
tems Ak=<Ak, Rk, fik >, where signature Gk contains the operator Vk(ai,aj) 
represented as a matrix Fk and handles the conflation problem uniformly. 
From this point uniformity permits us to use a single and already imple- 
mented algorithm to search for matching features in the images. It does not 
matter for the algorithms in algebraic form whether elements of Ak are 
straight-line segments, polylines, areas, or complex combinations, or some 
other features. Elements of Ak also can be numeric characteristics of image 
components such as a size of region i in image k, Sy. 

Example: Let matrix Vi be computed for regions with areas Sn=6, Si2=4, 
Si3=2, Si 4=1 (see Table 4) in image 1 and matrix V2 is computed for areas 
821=4, 822=1, 823=6, 824=7 in image 2 (see Table 5). 



Table 4. Matrix of shape size ratios in Image 1 





S„=6 


Si2-4 


Si3=2 


S,4=l 


Sii=6 


1 


4/6 


2/6 


1/6 


Si 2=4 


6/4 


1 


2/4 


1/4 


S,3=2 


6/2 


4/2 


1 


1/2 


Sl4=l 


6/1 


4/1 


2/1 


1 
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Table 5. Matrix of shape size ratios in Image 2 





821=4 


822=1 


S23-6 


824=7 


S 21-4 


1 


1/4 


6/4 


7/4 


822=1 


4/1 


1 


6/1 


7/1 


823=6 


4/6 


1/6 


1 


7/6 


824=7 


4/7 


1/7 


6/7 


1 



The brute force method would search equal ratios in two matrixes 
excluding the diagonal. There are six equal numbers in these matrixes, which 
may indicate the match uncertainty. In fact, there are only three numbers that 
should be considered (only numbers above the diagonal). The numbers be- 
low the diagonal are 1 /cij of the numbers above the diagonal cy. This is an 
unambiguous case, where ratio 6/4 for Sn=6, Si2=4 is matched to 823=6, 
821=4, that is region Gn in image 1 is matched to region G23 in image 2 and 
region G12 in image 1 is matched to the region G21 in image 2 . The center of 
each region is computed as an average of coordinates of all points (pixels) of 
the region. 

Computational efficiency of the algorithm depends on how quickly ma- 
trixes Vi and V2 will be computed for images with the large number of re- 
gions. We can notice that matrix Vi has only n-l independent ratios. All 
other ratios from ratios are computed from them excluding the diagonal 
that contains all I’s by definition. The theorem about this is proved below. 

It is reasonable to start from these n-l independent ratios. If all these ra- 
tios in A differ from n-l independent ratios in B then the next n -2 ratios are 
computed in both matrixes. If they also have no equal values then the proc- 
ess continues for the next n -3 ratios until all elements A from the upper part 
of^ are exhausted. 

If some ratios in this process are equal then there is no reason to compute 
ratios that are derived from them. They will be equal too. 8pecifically, if Cy = 
^st and Cjk = qtq then we do not need to compute Cik and ^sq- They do not add 
new information and are equal (see proof below). From Cy = qst we can de- 
rive that region Gn is matched to the region G2S and region Gij is matched to 
the region G2t. 

8imilarly from cjk= qtq we can derive that region Gik is matched to the re- 
gion G2q. Equality of Cik and ^sq does not add new information because it 
matches region Gh with region G2S and region Gik is matched to the region 
G2q, but this match was already established. 

The previous consideration was based on sequential fill of lines that are 
parallel to the matrix diagonal. Tables 6 and 7 illustrate how the third and 
forth lines above the diagonal (termed the and 4 * diagonals) are com- 
puted for V2. The light color shows inputs and the dark color shows output. 
These computations use the multiplication formula ( 1 ). 
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Table 6 . Matrix of shape size ratios in Image 1 





821=4 


822— 1 


S23-6 


824=7 


S21— 4 


I 


1/4 


6/4 


7/4 


822=1 


4/1 


1 


6/1 


7/1 


823=6 


4/6 


6/1 


1 


7/6 


II 


4/7 


1/7 


6/7 


1 



Table 7. Matrix of shape size ratios in Image I 





821=4 


822=1 


S23-6 


II 


821=4 


1 


1/4 


6/4 


7/4 


822=1 


4/1 


1 


6/1 


7/1 


823=6 


4/6 


6/1 


1 


7/6 


824=7 


4/7 


1/7 


6/7 


1 



To make match more visible we can reorder lines in Table KB according 
to S 2 i values starting from the largest value 7 (see Table 8). Now we may 
notice that two submatrixes with bold frames are the same in Tables 8 and 7. 



Table 8 Matrix of shape size ratios in Image 1 
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822=1 
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6/1 


4/1 


1/1 



Below we discuss the development of a part of the algorithm that finds 
Vmca, a largest common subset in matrixes V\ and F 2 . If this subset is at least 
3x3, i.e., it includes three features and centers of those features are not lo- 
cated on the same straight line then an affine transform can be found be- 
tween images 1 and 2. If the common subset is much larger than 3x3, then it 
creates a higher level of confidence that conflation of two images is not ac- 
cidental. 

In this chapter, the search for the common part is more general than the 
search of submatrixes along the main matrix diagonal in Chapter 19. A 
submatrix {cij}ij=k,k+i,...,m used in Chapter 19 is formed by a set consecutive 
indexes from k to m. The matrix subset is formed by the set of indexes that 
may have gaps, e.g., T={1,2, 5,7,9}, that is {cijlijex- Note that by reordering 
matrix rows and columns, we transform a matrix subset to a submatrix, but 
this may not speed up the search of the common subset, because the second 
matrix can be ordered differently. 
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An algorithm generates all subsets of Vi of size 3x3 and all subsets of V 2 
of size 3x3. We denote them as V\ (T,) and V 2 (Te), where and are in- 
dex sets that identify three features that form 3x3 subset. Then similarity of 
every pair Vi (Tr) and V 2 (Te) is tested using the Euclidian distance (D): 

D(V,(P),V,(TJ) = S, C - )’ 

and finding a pair with smallest distance that is below the threshold L, where 
the mapping ^ matches features in two images: 

D{V, (T,), V 2 (Te)) ^ min 



D{Vi (T,), V2 (Te)) < L. 



6.2 Proofs 



Lemma. If Cij = qst and cjk = qtq then Cik =qsq, 

Proof The proof is based on 

r^ij'r^jk r^ik. (1) 

This follows from the definitions of cy, cjk and Cik, Cij,=Si/Sj, cjk =Sj/Sk and 
Cik=Si/Sk, where 

Cij-Cjk = (Si/Sj)(Sj/Sk)= Si/Sk =Cik. 

Now using Cij = qst and cjk = qtq in (1) we will get 

Cik r^ij'Cjk qst' qtq ^sq 

The use of formula (1) is shown graphically in Table 9 where elements 
C 23 =2/4 and C 34 =1/2 in light cells produce element C24=(2/4)(I/2)=l/4 in the 
dark cell. Similarly cells are marked in Tables KL above for computing ele- 
ment C 13 . 

Together elements cn and cn form a line that is parallel to the diagonal 
and contains n-2 elements, where n is the size of the matrix. In this example 
n=4. The last element of the matrix A is C 14 . The computation of this ele- 
ment from elements cb and C 34 is illustrated in Table 10. All these tables in- 
dicate an important property that is formulated as a theorem below. 
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Table 9. Matrix of shape size ratios in Image 1 
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Table 10. Matrix of shape size ratios in Image 2 
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6/1 


4/1 


2/1 


1 



Theorem: if all elements cjj+i (/'=!, 2,..., n) of nxn matrix^ are given then 
all other elements of A can be restored. 

Proof. We can omit computing elements of Vj under diagonal, all these 
elements are cy =7/cji for elements above the diagonal. This directly follows 
from the definition of elements of Vj. The further proof is provided by de- 
signing an iterative process that computes all other elements of Vi. 

Step 1. Compute all elements ajj +2 by using elements ajj+i in formula (1): 

j+r^j+l, j+2 ^ij+2- 

Step 2. Compute all elements cjj +3 using formula (1) and elements Cij +2 

computed in Step 1 : 

j+r^j+l, j+2 l-ij+2- 

General Step k: Compute all elements cj j+k using formula (1) and ele- 
ments Cij+k-i computed in Step k-1 : 

}+k ^ij+k- 

Repeat Step k until j+k-l< n that is the whole matrix Vj is exhausted. 

6.3 Develop algorithm for finding invariant parameters in 
images 

Finding invariant parameters constitutes step 7 of ADTC. ARC algorithm 
uses area ratios as invariant parameters for conflation. To be able to run this 
algorithm we need to fmd/extract regions, compute their areas and centers 
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for matched ones. This work is done by a Vectorizer algorithm that sharpens 
images and finds regions using a flood-fill method from computer graphics 
[Angel, 2000] that starts from a seed point and looks recursively at colors of 
adjacent pixels including diagonal neighbors until all neighboring pixels of 
the same color are found. The set of these pixels is considered a single re- 
gion. The number of the pixels in the region is considered its area/size. A set 
of all extracted regions is ARC algorithm input. 

After conflation is done by the ARC algorithm the conflation quality 
can be evaluated by visual inspection of the conflated images and by a com- 
putational procedure based on the absolute and relative difference between 
matched regions. 

The difference of two regions is XOR (exclusive OR) of pixels of regions 
G and G'. The absolute difference of regions G and G', A(G,G') is com- 
puted as the number of pixels in the difference of regions G and G' : 

A(G,G')=S(XOR(F (G),G')) 

and the relative difference of the regions is 

p(G,G')=A(G,G')/(S(G)+ S(G')). 

The total difference between three matched regions {G} and {G'} of im- 
ages Iml and lm2 found by ARC algorithm is 

p({G},{G'})= p(Gi,G'i) + p(G2,G'2)+ p(G3,G'3). 

Similarly the relative difference of matched regions is 

p({G},{G'})=[S..i,2,3A(G„G'i)] /[Zi.i,2,3(S(Gi)+ S(G',))]. 

The maximum of relative difference is 1 and the minimum is 0. 



7. DETERMINE CONFLATABLE IMAGES AND 
ALGORITHM LIMITATIONS 

Determining conflatable images and algorithm limitations constitutes 
step 8 of ADTC. Pixel based size ratio methods work well on images which 
are feature-rich enough to have at least 3 uniquely sized regions in both im- 
ages. Some known limitations are: 

1) Regions need to be completely contained in the images, not partially 
cut off by the image edge. 
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2) Images with a large number of colors may require pre-sharpening. 

If only a half of the shape (e.g., lake) is present in one of the images and 
the image border goes through the lake, the region size ratios would not 
match and another region would have to be used for conflation. If another 
region is not found in the image, then the image is too feature-poor for suc- 
cessful conflation with this region size ratio method. A line-based method 
may be able to conflate the cut off lake based on its coast line shape. Thus, 
line-based conflation methods described in Chapter 19 need to be run on re- 
gion-poor cases. Below we summarize known limitations that are solvable 
by other methods: 

• region poor because of cut off (line-based methods may work); 

• lack of unique regions - many equal sized of regions (line-based 
methods may work); 

• smooth color transition (pre aggregate pixels into larger regions and 
then run a region-based method). Note that in this case differences in 
aggregation color thresholds may preclude a match. 

There are also limitations that are not solvable by other methods: 

• no regions or other features at all, e.g., two featureless pieces of de- 
sert; 

• all regions are the same, this case is theoretically ambiguous given 
current data; 

• heavy darkening or lightening can melt together two regions. 

It seems that the ARC algorithm domain of images with three or more 
regions of different sizes completely contained in the images and with a 
modest number of colors is sizable and significant. Also the Area Ratio Con- 
flation Algorithm (ARC) described in this Chapter a good approximation for 
the whole conflation problem in a sizeable area of images. It is important 
that applicability of the algorithm to particular images can be tested in ad- 
vance. 



8. SOFTWARE AND COMPUTATIONAL 
EXPERIMENT 

The following series of screenshots demonstrate the initial implementa- 
tion of the Area Ratio Conflation Algorithm (ARC) for raster images, im- 
plemented as an ArcMap Plug-in. ArcMap is the central application in Ar- 
cGIS Desktop software developed by ESRI. It supports all map-based tasks 
including cartography, map analysis, and editing [ArcGIS, 2004]. 

User actions needed to use the Plug-in include loading the two images 
into ArcMap and clicking a “Conflate” button. The Plug-in allows the algo- 
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rithm to run in visualized or non-visualized mode based on the users selec- 
tion. The screenshots also demonstrate that the algorithm is capable of han- 
dling disproportionate scaling. 

In the following screenshots (Figures 8-14) we consider the conflation of 
various sections of an aerial photo and a topographic map of the same area 
with three lakes. 

The screenshot in Figure 8 shows two images before conflation is ap- 
plied. The image on the left is a preprocessed aerial photo rotated 90 degrees 
and stretched 2 times in the y direction, thus being disproportionally scaled. 




Figure 8. Two images before conflation is applied. A part of the aerial photo is rotated 90 
degrees and stretched 2 times in the y direction, thus being disproportionally scaled. 



The shape extraction stage in shown in Figure 9 for both images from 
Figure 8. One image has 6 shapes, the other has 23 shapes. Shapes 3, 4, 6 in 
image 1 are the 3 lakes, and shapes 14, 13, 15 are the 3 lakes in Image 2. 
Shape 3 matches shape 14, shape 4 matches 13, and shape 6 matches shape 
15. In the visualization the top left corner of the shape label corresponds to 
the Center Point of the shape with that number. In this case the size ratios of 
the 3 lakes are used to automatically match up the images. The Center points 
of the lake shapes are used to calculate the transform needed for conflation. 
The program makes this determination automatically. 

Figure 10 shows the two images with matched features shown with rota- 
tion, translation and scaling applied. 
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Figure 9. Two images at shape extraction stage. One of the images is scaled 
disproportionally. See also color plates. 




Figure 10. Two images with matched features are shown after rotation, 
translation and scaling are applied 

Figure 1 1 shows the two full images of the sharpened aerial photo and 
the topographic map before conflation. The aerial photo is rotated 90 de- 
grees. 
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Figure 11. Two full images sharpened aerial photo and topographic map before conflation. 
Aerial photo is rotated 90 degrees. 



Figure 12 shows the visualization of the shape extraction and shape 
matching stage of the conflation of the two full images of the sharpened ae- 
rial photo and the topographic map. No scale difference is present here. 




Figure 12. Shape extraction and shape match visualization. See also color plates. 
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The next screenshot (Figure 13) shows the two complete images after the 
conflation algorithm has been applied to the full-size original images. 

Figure 14(a) shows the two smaller images, which are parts of the aerial 
photo and topographic map before conflation. Figure 14(b) shows the results 
of conflation of these smaller images, which are parts of the aerial photo and 
topographic map. 




Figure 13. Two complete images after match applied to whole original images. 




(a) Before conflation. 




(b) Results of conflation 



Figure 14. Conflation of two smaller images, which are parts of the aerial photo and 
topographic map. See also color plates. 
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9. CONCLUSION 

Human practice and expertise is fragmented, an individual imagery ana- 
lyst may not work with some specific categories of images and may not pro- 
vide sufficient input for formal algorithm development. A technology for 
algorithm development is designed for such situations to integrate a collec- 
tive expertise of imagery analysts. The chapter presented an algorithm de- 
velopment technology for conflation (ADTC) that goes beyond the devel- 
opment of an individual algorithm. ADTC is described in terms of its desir- 
able eharaeteristies and the actual scope of applicability of individual algo- 
rithms. The concept of monotonicity is used as a tool to help analyze and 
identify the scope of the algorithm coverage. The ADTC technology con- 
tains eight steps described in this chapter. 

The ADTC technology was illustrated with the ARC algorithm that was 
developed in accordance with ADTC. The experiments showed that the ARC 
algorithm could handle rotation, translation, and scaling (including dispro- 
portional scaling). The ARC uses area ratios that are generally invariant to 
these transformations. The experiments also showed that the method could 
handle the images with a significant number of features. 
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11. EXERCISES AND PROBLEMS 

Advanced 

1 . Design a conflation algorithm using ADTC technology 

2. Evaluate invariance of your algorithm for affine transforms. 

3. Evaluate the scope of your algorithm 

4. Write code that implements algorithm ARC. 

5. Write code that implements your algorithm designed in exercise 1 . 
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6. Compare ARC and your algorithms conceptually and in computational 
experiments. 
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Abstract: The unique human expertise in imagery analysis should be preserved and 

shared with other imagery analysts to improve image analysis and decision- 
making. Such knowledge can serve as a corporate memory and be a base for an 
imagery virtual expert. The core problem in reaching this goal is constructing a 
methodology and tools that can assist in building the knowledge base of im- 
agery analysis. This chapter provides a framework for an imagery virtual ex- 
pert system that supports imagery registration and conflation tasks. The ap- 
proach involves tree strategies: (1) recording expertise on-the-fly and (2) ex- 
tracting information from the expert in an optimized way using the theory of 
monotone Boolean functions and (3) use of iconized ontologies to built a con- 
flation method. 

Key words: Imagery virtual expert, ontology, knowledge base, rule generation optimiza- 

tion, monotone Boolean function, registration, conflation. 



1. INTRODUCTION 

The goal of imagery registration is providing geospatial coordinates to 
the image. The goal of the imagery conflation is correlation and fusion of 
two or more images or geospatial databases. “The process of transferring 
information (including more accurate coordinates) from one geospatial data- 
base to another is known as ‘conflation’” [FGDC, 2000]. Typically, the re- 
sult of the conflation is a combined image produced from two or more im- 
ages with: (1) matched features from different images and (2) transforma- 
tions that are needed to produce a single consistent image. Note, registration 
of a new image can be done by conflating it with a registered image. 
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Such a way of registration can be useful if there is a lack of reliable meta- 
data that provide registration directly. This consideration motivates us to 
concentrate on the conflation task in this paper. Recently the conflation has 
been viewed as a matching technique that fuses imagery data and preserves 
inconsistencies (e.g., inconsistencies between high and low resolution maps, 
“best map” concept, [Edwards, Simpson, 2002]). This approach tries to pre- 
serve the pluralism of multisource data. The traditional approach [USGS, 
1998] uses an “artistic” match of elevation edges. If the road has a break on 
the borderline of two maps then a “corrected” road section starts at some 
distance from the border on both sides and connects two disparate lines. This 
new line is artistically perfect, but no real road may exist on the ground in 
that location (see Figure 1). 




Figure 1. Initial mismatch and “artistic” conflations 



Why design virtual experts for eonflation? Can the conflation problem 
be solved by designing a sophisticated mathematical procedure without rely- 
ing on an expert’s knowledge? In essence, the conflation problem is a con- 
flict resolution problem between disparate data. Inconsistencies in multi- 
source data can be due to scale, resolution, compilation standards, operator 
license, source accuracy, registration, sensor characteristics, currency, tem- 
porality, or errors [Edwards, Simpson, 2002]. The conflict resolution strate- 
gies are highly context and task dependent. Dependency of conflation from a 
specific task is discussed in Chapters 17, 18 and 19. 

In solving a conflation problem, experts are unique in extracting and us- 
ing non-formalized context and in linking it with the task at hand (e.g., find- 
ing the best route). Unfortunately, few if any contexts are explicitly formal- 
ized and generalized for use in conflating other images. It is common that 
the context of each image is unique and not recorded. For example, an expert 
conflating two specific images may match feature FI with feature F3, al- 
though the distance between features FI and F2 is smaller than the distance 
between features FI and F3. The reasoning (that is typically not recorded) 
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behind this decision could be as follows. The expert analyzed the whole im- 
age as a context for the decision. The expert noticed that both features FI 
and F3 are small road segments and are parts of much larger road systems A 
and B that are structurally similar, but features FI and F2 have no such link. 
This conclusion is very specific for a given pair of images and roads on these 
images. The expert did not have any formal definition of structural similarity 
in this reasoning. Thus, this expert’s reasoning may not be sufficient for im- 
plementing in an automatic conflation system for conflating other images. 
Moreover, informal similarity the expert used for one pair of images can dif- 
fer from similarity the same expert will use for two other images. 

There are two known approaches to incorporate context: (1) formalize 
context for each individual image and task directly and (2) generalize context 
in the form of expert rules. In the first approach, the challenge is that there 
are too many images and tasks and there is no unified technique to for con- 
text formalization. The second approach is more general and more feasible, 
but in some cases may not match a particular context and task, thus a human 
expert needs to take a look. 



2. SHORTCOMINGS OF PREVIOUS ATTEMPTS TO 
DEAL WITH THE SUBJECT 

Currently, even large knowledge bases cannot answer many questions, 
which are in their scope. The real world is too dynamic, uncertain, and com- 
plex for even modern knowledge bases. The conflation/registration problem 
is an example of such a real world problem. DARPA’s program “High- 
Performance Knowledge Bases” [HPKB, 1996] that started in 1997 has set 
up the critical size barrier for large knowledge bases around the 10,000 
axiom/rule limit. At that time, DARPA’s goal was to build technology, 
which will scale up to 100,000 axiom/rule knowledge base systems [HPKB, 
1996]. The DARPA program “Rapid Knowledge Formation” [RKF, 1999] 
formulated new requirements that include parallel entry of knowledge by 
teams of 25-50 individuals (end users) for test tasks such as crisis manage- 
ment and battlespace understanding. 

According to DARPA using High Performance Knowledge Base tech- 
nology, a 5-person team can create knowledge at a rate of 40 axioms per 
hour and lOOK of axioms per year. After that, DARPA stated a new goal: the 
creation of new knowledge at a rate of 400 axioms per hour. Next, DARPA 
identified the criterion of comprehensiveness of the knowledge base at the 
level of a million axioms. The PARKA project research team at the Univer- 
sity of Maryland extracted 125913 assertions (facts) from CIA World Fact 
book pages on the World-Wide Web using a web robot [VLKB, 1998]. 
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Thus, a million axioms can be comparable with encoding knowledge from 
eight books like the CIA World Fact book. Next, notice that these assertions 
are not rules, hui facts such as “economy_imports#tajikistan#$690 million 
{1995},” that is, Tajikistan’s imports were $690 million in 1995. In addition, 
those assertions have been extracted from the written text using a web robot. 
For the conflation task, there is no text available to use as a source for a web 
robot. This means that we need to build such written sources and extract 
rules from experts directly. 

To understand what the million-axiom size of the knowledge base means 
in more detail we need to clarify the concept of axiom used by DARPA. The 
same DARPA source provides an example of the axiom: 

Vx, p\, p 2 vehicle} x ) <^ physical_object( x ) and self-propelled} x ) and 

can} move(x),pi,p 2 ). 

This is a relatively simple statement with three basic statements com- 
bined using AND operator. For complex tasks with interdependent attributes, 
axioms can involve more than ten statements connected by the AND opera- 
tor. Respectively the time for extracting these rules can be much longer. 
They also can be much less trivial and certain as in conflation problems. 

Let us consider the rate and quality of knowledge base development 
reached in 1999 in the High Performance Knowledge Base program [RKF, 
1999]. The maximum number of axioms was 90,000 per 10 months and the 
smallest number of axioms was 2300 per seven months. Depending on the 
domain and the problem, 90K may be not enough or 2300 may be enough. 
Also note that 90,000 < 2'^, which means that for designing a complete 
knowledge base with 17 binary attributes we need even more axioms. 

Building comprehensive virtual experts. Let us illustrate the important 
question of completeness and comprehensiveness of the knowledge base. A 
knowledge base will be called complete if for a given set of attributes the 
knowledge base can generate an answer for every combination of values of 
these attributes. We will say that a knowledge base has comprehensive cov- 
erage if a set of attributes of rules in the knowledge base covers most of at- 
tributes used in the domain. For instance, we may include in the knowledge 
base all the attributes used in NIMA’s Vector Product Format }VPF) to get a 
comprehensive coverage. Nevertheless, this knowledge base may not be 
complete because rules cannot produce answers for many questions formu- 
lated as AND combinations of VPF attributes. 

However, there are some positive examples of a complete knowledge 
base. For instance, in medical imaging [Kovalerchuk, et al., 1996, 2001] 
having 1 1 binary attributes of X-ray images }mammograms) entered into the 
knowledge base for a particular patient the knowledge base should output 
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one of the target values: “highly suspicious for malignancy” or “probably 
benign.” Another target with the same 1 1 binary features for the same patient 
will be biopsy (should biopsy be done or not). Both of these questions are 
life critical questions (more than 1 00,000 die each year of breast cancer in 
the US). An axiom “mined” from the experienced radiologist may look like 
the following: 

If variation in shape of calcifications is marked AND the number of cal- 
cifications is between 10 and 20 AND the irregularity in shape of calcifica- 
tions is moderate THEN the case is highly suspicious for malignancy. 

The common way to extract such rules is to ask an expert to write down 
rules that the expert uses and to provide software for converting the rules to 
computer-readable knowledge base (KB) form. However, it is unlikely that 
the expert will enter complex rules involving, say 12 attributes. Often it is 
above a human’s capabilities to keep in mind more than 5 to 9 attributes for 
analysis. Later testing may show that the rule with only 3 attributes is wrong 
for some cases and the knowledge base should be refined. The refinement 
can take years and for life critical applications the systems should not be 
used before the process of cleaning rules will successfully finish. The prob- 
lem is that this process can be exponential in time. For instance, having 14 
binary attributes we may search among 2'"* =16384 potential rules like the 
rule shown above. 

To avoid refining and testing the knowledge base for years, we need to 
be sure that the set of rules is complete enough from the beginning. Asking 
the expert does he believe that 10 rules he entered are complete may not be 
the right choice. We need to be sure that the rest of potential 16384- 
10=16374 rules are not rules at all. Thus, DARPA’s design time should also 
measure both rules included in the knowledge base and the rules rejected. If 
we know that something is not a rule, this is also useful knowledge. There is 
a big difference between a rejected rule and a rule unconfirmed by the expert 
or not tested against independent data yet. DARPA’s current goal of 
1,000,000 axioms, would correspond to a design of a complete knowledge 
base with less than 23 binary attributes. More exactly, in the worst case sce- 
nario for 22 attributes we may need to record 1,144,066 axioms (using for- 
mula from Hansel Lemma [Kovalerchuk et al, 1996, 2001]), which exceeds 
the 1,000,000 axioms. To find all these axioms (rules) we may need to 
search in a much larger set of possible axioms, which is 2^^. 
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Let us note that the problem of designing a complete knowledge base 
above the limit of 20 attributes (10® potential rules) is especially critical 
when knowledge is not presented in any printed form (book, articles) and 
should be extracted from an expert as a sole knowledge body. This is the 
case of the virtual expert for imagery conflation/registration problem. 



3. GOALS AND IVES SYSTEM ARCHITECTURE 

The goals of this chapter is to determine how to build an Imagery Vir- 
tual Expert System (IVES) for imagery analysis, create tools to capture 
imagery specific information and knowledge for IVES, and create tools to 
foster intelligent consultation with IVES. 

We discuss specific tasks and methods that represent three views of the 
system: an imagery analyst view (end user view), knowledge engineer’s 
view (system support view), and tools view (developer’s view). 

The support of an imagery analyst’s view means defining imagery 
analysis from the Decision Making Level and to the Subpixel Level, imple- 
menting analysis tools, providing quality assurance tools, and specific tools 
for imagery conflation. 

The support of a knowledge engineer view includes providing tools for 
discovering imagery analysis rules including recording conflation process 
and discovering conflation rules. 

The support of a developer view means providing conceptual and algo- 
rithmic base for building methods and software for mission-specific solutions 
that include: 

• new optimized rule extraction procedures using monotone Boolean 
functions; 

• a contradiction analysis method for extracted rules and for decon- 
flicting rules obtained from different sources; 

• a recording method for capturing expert knowledge on-the-fly, 

• Image-DAML {I-DAML) language, DAML ontologies, and agents 

IVES architecture contains three integrated components: 

• Analysis and Recording Tool (ART), 

• Multi-Image Knowledge Extractor (MIKE), and 

• Joint Outline Notator with icon markup (JON). 

The general design of IVES system is presented in Figure 2. The com- 
plete system design contains seven components. The first component- Inter- 
active on-the-fly recording of expert’s actions during registration/conflation 
serves as major source of the raw information for rule generation. It is also 
useful as a quality control tool of the analyst’s conflation results (a kind of 
airplane “black box”). An interactive optimized rule generation procedure 
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based on the theory of monotone Boolean functions (TMBF) is intended to 
speed up direct generation rules by the imagery analyst. These two compo- 
nents are implemented in Java as a web portal. 



(l)ART: Interactive on-the- 




fly recording of expert’s 




JON 


actions during registration/ 






conflation 





I (3) Data mining 



(2) ART-MIKE: Interactive 
recording imagery facts and 
expert opinions from images 
and texts (context recording). 
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AR(7) T-MIKE: Image- 




ry problem solving 




(conflation, change 




detection, ATR,...) 



JON- Joint Outline 
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Figure 2. Imagery Virtual Expert System (IVES) architecture 

These components are described in more detail in the following sections. 
The data mining component (block 3) is designed to generalize the record of 
an expert’s actions. The results of such generalizations are conflation rules. 
The major problem for successful mining of rules from recordings is that 
the system actually records lower level expert’s actions, such as rotation, 
scaling, translation. Mining these lower level rules may not be very benefi- 
cial, thus the system provides a tool for the expert to record an identification 
of upper level categories such as “selecting main feature”, “conflating main 
features”, etc. The system design includes recording expert’s actions and 
mined rules in XML format, for rules it is RML (rule markup language). 

Automatic retrieval and recording of facts from written sources (Block 
(3)) is designed to fulfill functions similar to PARKA project [VLKB, 1998]. 
At this moment, we do not consider this source as a main source to fill the 
KB with facts related to registration and conflation of images, but potentially 
it can provide useful facts for the KB. In contrast, interactive recording of 
imagery facts from images and texts (Figure 2, block (6)) can be one of the 
major sources of conflation related facts right now. The reason is that such 
recording provides context for conflation and registration tasks. 
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4. INTERACTIVE ON-THE-FLY ANALYSIS AND 
RECORDING 

The IVES system component called the Interactive on-the-fly Analysis 
and Recording Tool (ART) is implemented as a web portal that allows an 
expert to conflate images while having the knowledge presented by conflat- 
ing these images recorded on-the-fly. A part of IVES is also implemented as 
an ArcMap Plug-in. ART currently allows the user to load a set of images 
and conflate these images using basic scaling, translation and rotation tools. 

The user can view these images overlapped and change the opacity of the 
images for conflating. The system provides facilities for marking up the sec- 
tions of the images using various shape tools. Each of these markups can be 
named by the user using a basic name of his/her choice or by choosing from 
a list of predefined terms from a variety of ontologies. Such marking permits 
to build the bridge between the image and the domain that will describe the 
image in a deeper context. 

Currently, the system ontology base includes three ontologies: 

• DAME-OWL Geofile ontology [DAME Geofile, 2001], 

• DAME-OWL CIA World Fact book ontology [DAME WFB, 

2002 ], 

• DAME-OWL NIMA Feature and Attribute Coding Catalogue 
[FACC]. 

Other ontologies also can be loaded. The Geofile ontology consists of about 
70 terms on the low level and 6 terms on the top level of the tree next to the 
root. The World Fact book ontology consists of about 190 terms on two lev- 
els and the Feature and Attribute Coding Catalogue ontology consists of 540 
terms on the low level, 59 terms on the next level and 8 terms on the level on 
level next to the tree root. ART supports tree view of these ontologies with 
the following functionality: browsing, editing (adding new terms and delet- 
ing terms), expanding and shrinking tree view, adding icons to terms and 
drugging icons to images. 

Selected parts of images can be marked up with terms from these on- 
tologies (see block 1 in Figure 3). There is also a detailed list of actions the 
user has performed that can be undone and redone to any point. A basic 
magnifier is available for taking a detailed look at the image. Any of these 
markups and conflations can be applied to multiple images to allow two (or 
more) images already conflated to be conflated to a third image or simply 
zoom in all images. All of these actions are recorded, can be presented in a 
human readable form and are available for playback. 

ART includes three categories of tools: basic tools, conflation tools and 
on-the-fly user action recording tools. Basic and conflation tools allow the 
user to load a set of images, and then conflate them using scaling, translation 
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and rotation and affine control points and shape based conflation tools. 
These tools provide a foundation for interactive tools on-the-fly recording of 
expert’s actions is implemented as a web portal. Thus the system allows an 
expert to conflate images while having the expert knowledge presented by 
conflating these images recorded on-the-fly. 

The user can view these images overlapped and change the opacity of the 
images for conflating. The system provides facilities for marking up the sec- 
tions of the images using various shape tools. Each of these markups can be 
named by the user using a basic name of his/her choice or by choosing from 
a list of predefined terms from one of the ontologies, e.g., DAML-OWL 
Geofile ontology. There is also a detailed list of actions the user has per- 
formed that can be undone and redone to any point. A basic magnifier is 
available for taking a detailed look at the image. Any of these markups and 
conflations can be applied to multiple images to allow two (or more) images 
to be conflated. All of these actions are recorded, can be presented in a hu- 
man readable form and are available for playback. Figure 3 shows a confla- 
tion sample with user action recording using ART. 




|:j^ Applet rulegen started Internet 

Figure 3. Image of sample conflation using the case recorder tool 

The list below presents comments to numbered items shown in Figure 3: 

1. The Markup tools can be used to mark major features on the image and 
name them, these names can be defined in a variety of ontologies. 

2. The move, resize, and rotate tools are basic operations used to conflate 
images 
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3. The Auto Conflate tool allows the user to choose 3 points on two images 
and using transformations, match those points together (and hopefully 
the images as well). 

4. The Show recording button allows the user to see how the conflation 
was broken up via the current mode that was used in various segments of 
the conflating. 

5. The Opacity slider is used to set the transparency of the images, allow- 
ing one image to be seen through another. 

6. The checkboxes are used to select the image(s) that will receive the op- 
erations such as move or draw polyline. This allows a user to resize or 
rotate both images together, or once two images have been conflated to- 
gether, a third image can be brought in and conflated against the other 
two together. 



5. MULTI-IMAGE KNOWLEDGE EXTRACTOR 



5.1 Components and architecture 



A Multi-Image Knowledge Extraetor (MIKE) assists an imagery ana- 
lyst in rule extraction and recording. The common way to extract rules is 
asking an expert to write down rules and providing software for converting 
the mles to computer-readable knowledge base (KB) form. The major prob- 
lem with this straightforward approach is that: 

• Typically experts have limited time available for rule entering, re- 
fining, testing and debugging. 

• The refinement time can grow exponentially with adding more at- 
tributes. 

• Experts are unlikely to enter complex mles because it is difficult to 
keep in mind more then 7±2 attributes. 

• In life critical applications, the process of mle refinement and de- 
bugging has to finish before the system is used. 

The complete IVES system design contains seven units shown in Figure 
2 above. Unit (1) serves as major source of the raw information for mle gen- 
eration. Unit (2) supports interactive optimized mle generation. The mle 
generation contains five steps depicted also in Figure 4: 

• Interactive recording characteristics to be used as arguments of 
mles; 
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• Interactive optimized rule generation based on the theory of mono- 
tone Boolean functions; 

• Recording test cases for testing rules; 

• Testing rules against test cases, and 

• Recording rules to the knowledge base of imagery registra- 
tion/conflation expertise. 




Figure 4. Interactive rule generation 



5.2 Interactive optimized rule generation and testing 

To model expert knowledge an interactive optimized rule generation 
mechanism is implemented in MIKE. It is based on the theory of monotone 
Boolean functions [Kovalerchuk et al., 1996, 2001] where a set of binary 
vectors represent combinations of image characteristics an inputs of mono- 
tone Boolean functions. Previously approach has been successful in the 
medical application [Kovalerchuk et al., 2001]. The medical imagery analyst 
(radiologist) was asked only 40 questions and a complete set of rules (out of 
potential 2048 questions) has been extracted in just 30 minutes. 

Prototype web-based medical expert consultation system has been cre- 
ated. Similarly, a prototype web-based conflation imagery virtual expert has 
been created and available for the research and education purposes from the 
book website at http://www.cwu.edu/~borisk/bookVis. 

The logic of rule generation based on the theory of monotone Boolean 
functions is as follows. Assume that the analyst identified n rule arguments 
(parameters). An example of parameters shown in Figure 5 is for the task of 
creating expert rules to judge that two images can be conflated by using a 
single affine linear transformation). Figure 5 contains 14 parameters such as: 
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(1) a simple dominant geometric feature exists, (2) a simple unique geomet- 
ric feature exists, (3) an asymmetric unique features exist and so on. 

Now the imagery expert can be asked if all these 14 arguments (parame- 
ters) are true for a pair if images, would he/she conclude that two images can 
be conflated with a single affine linear transformation that most likely is 
unique. If the answer is “yes,” then it will be encoded as 1. Taking into ac- 
count that all 14 arguments are Boolean too, we matched a Boolean vector 
(11111111111111) to (1). We can consider a subset of 14 parameters and 
ask the expert the same question about these subset of parameters. For in- 
stance, we can ask about situation represented by the vector 
((11011101110111), where the third “0” indicates that we do not require that 
parameter #3 should be true. The total number of subset (and questions) 
could be 2'"*. 

Assume that we built a system asking an expert only some of these 2'"* 
questions. Then if in the real conflation case we have a situation 
(11011101110111) that was not asked about then the incomplete knowledge 
base does not provide the answer and the conflation task can not be solved 
even if this is the solvable situation. The theory of monotone Boolean func- 
tions allows us to avoid asking 2'^* and still generate a complete set of rules. 
A specific example of rules build using MIKE system is described in Chap- 
ter 20. 
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Figure 5. Case parameter recording 
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Two fundamental ideas permit to accomplish this optimization of the ex- 
pert interview process: (1) dynamic sequence of questions - each further 
question depends on the expert’s answers to previous questions, (2) rule ar- 
guments are designed in such way that the property of monotonicity is ful- 
filled. 

Monotonicity means that if the expert believes that in the situation pre- 
sented by the vector v =(11011101110111) there is no way to have a single 
linear affine transformation then for every situation that has some “1” in v 
substituted by “0” the answer should be negative too. This means that we 
can eliminate a large number of questions to the rational expert. 

Expert interactively defines the rules that are encoded by the system of 
binary vectors. Next rules are tested against a database of test cases for 
which the answer is known in advance. In image conflation, the known cases 
can be the cases that are actually georeferenced and the correct transform 
can be computed from reference points. This would be testing against geo- 
referenced data. 

Initial steps. The screenshots of MIKE system are shown in Figures 6 
and 7 below. Figure 6(a) shows recording parameters and Figure 6(b) shows 
loading the sequence of the questions to be asked from an expert, imagery 
analyst using the property of monotonicity. 





mssl .0- ^ I..— 1..^ 

Question sequence loader 



Parameter editor 



Figure 6. Parameter recording and loading question sequence 




Defining rnles. In Figure 7 (a), each column is a question asked from the 
expert that is used by the system to build conflation rules in the form if 
<conditions for Image 1 and Image 2> then it is highly possible that <one and 
only one (unique) affine transform that conflates images 1 and 2 exists>. 
Colored columns are questions already answered by the expert (green col- 
umns indicate “yes” answer and “red” column indicate “no” answers). The 
current question is shown in the first (left-most) column. The expert presses 
“yes” or “no” buttons for the current question as determination of green or 
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red answer. The system after coloring the answer shifts all columns to the 
right and shows a new question in the first column to the expert. 




After main lines match, straight lines do not cross |7 
After main lines match, others have no neighbor match P7 



One and only one linear mapping matches images |7 R 



Case 1 :Number of possible mappings is equal to 1 -correct 
Case 2:Number of possible mappings is equal to 1 -correct 
Case 3: Number of possible mappings is equal to 1 -correct 
Case 4: Number of possible mappings is equal to 1 -incorrect 
Case 5:Number of possible mappings is equal to 1 -incorrect 
Case 6:Number of possible mappings is equal to 1 -incorrect 
Case 7:Number of possible mappings is equal to 1 -incorrect 
Number Correct: 3 
Number of False Positives: 0 




(a) Defining Expert Rules 



(b) Testing rules against image set 



Figure 7. Defining expert rales and case parameter recording 



Sequencing questions. The question sequence is loaded to the system as 
a text file as shown in Figure 6(b). A knowledge engineer can select another 
file that provides another sequence of questions. In this way the sequence 
that is most appropriate for the given expert and problem cab be used. The 
system is not simply show the next question from the file but generates it 
dynamically that is the next question can be eliminated if answer for that 
question can be derived from expert’s previous answers. Elimination is 
based on the powerful principle of monotonicity described in this chapter. 
The question log is presented in the bottom of the screenshot in Figure 7(a). 
It is updated after each answer and shows the index of the question that was 
eliminated from the question sequence presented to the expert. 

Automated rule testing. The screenshot in Figure 7(b) shows how the 
rale R that contains some of 14 parameters checked (i.e., their conjunction) 
is tested against a set of test image pairs T={<Ii,Ij>}. The value R(Ii,Ij) =1 is 
interpreted as a forecast that images ft and Ij are most likely uniquely con- 
fiatable by an affine transform. 

Each pair in T is associate with “ground truth” binary flag, f(Ii,Ij) which 
is interpreted that images ft and Ij are really uniquely confiatable by an af- 
fine transform, that is one and only one affine transform between T and Ij 
exists in their common parts within a reasonable error limits. 

The log window in Figure 7(b) shows that rale R was correct only par- 
tially. There are pairs of images where R forecasts differ from f(Ii,Ij) values 
known for T. Pairs of images from T may have no any affine transforms or 
have several different affine transforms within acceptable error level. Note 
that accuracy of R was evaluated using the judgment that parameters are cor- 
rectly evaluated. 
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6. ICONIC MARKUP IN IVES 



Iconization of images permits moving image analysis from the realm of 
details to simple visual pattern recognition. Dynamically composed icons 
displays user oriented relevant data. Grouping icons into a meaningful or- 
ganization similar to storyboards for movie direction is useful for reviewing 
one or more complex image summaries in iconic form. All together this 
technique enhances readability and comparability of images. Dynamic icon 
interaction moves complex analysis towards Drag and Drop icon simplicity. 

Supplying an analyst with an iconic markup system presents significant 
advantages for IVES by increased speed for the imagery analysis process 
and by more intuitive understanding of analysis elements over more typical 
markup systems such as simple polygons. The iconic markup architecture 
provides the following functionality: 

• Image iconization; 

• Scalable iconic summaries; 

• Iconic storyboards; 

• Aggregate icons; 

• Dynamic icon interaction and 

• Customizable icons. 

Icons can be operated on to compose, customize, change application area, 
and change non image data. A scalable solutions using iconic summaries of 
images and analyses compresses large volumes of information and make 
them more manageable for analysis. 

This is a customizable framework — iconic summaries and notation can be 
designed to fit the analyst’s focus, needs and working style. Tools such as 
Semantic Zooming, Icon Editing and Icon Libraries aid in assuring that the 
user can tailor the system to his/her needs. 

These capabilities are described in more detail in Chapters 8-10 with ap- 
plication for text annotation by icons, but this technique is equally applicable 
to images. An iconic notation system enhances standard markup with icons 
and other icon based mechanisms used to identify aspects of the markup re- 
gion, and other related info. If icons are added to the markup, there are in- 
stances where an iconic summary is sufficient to give an overview and ac- 
cess to details about the image without the need to present the image. This 
offers significant space savings and can result in enhanced comparisons over 
large numbers of images. Figure 8 illustrates a combination of a standard 
marking the area of interests (AOIs) using rectangles and polygons with 
iconic annotation. 
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Multi-image feature correlation. 



Water surface Damaged crop Road under water 





ffpf 
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Detailed markup 



Iconic summary of image 

Figure 8. Iconic notation 



A rectangular markup in top left image in Figure 8 shows the flood area and 
the flood icon in the comer. The smaller rectangular markups indicate road 
and crops under flood. After two images have been conflated the marked up 
areas are automatically transferred to another image taken before the area 
was flooded. Such transfer can help to estimate damage and plan rescue op- 
erations. A detailed markup shown on the left identifies the flood area in 
more detail. The iconic summary in the bottom of Figure 8 can be read as “a 
flood area with crop damage and a road under flood”. An analyst can review 
such annotations before looking actual images in detail especially after con- 
flation. This can be done faster than work with images that contain much 
more information and majority of which may not be relevant. 



7. ICONIC ONTOLOGICAL CONFLATION 

General Iconic Conflation is implemented as an extension of ART- 
MAKE software. This software: (1) loads images, (2) loads ontologies in- 
cluding iconized ontologies, (3) edits ontologies, (4) marks up images using 
a selected ontology, (5) marks up images with icons associated concepts in 
ontologies (each images can be annotated using a mixture of ontologies), (6) 
generates ontological iconic annotation of images to be able to compare and 
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conflate images on conceptual ontological level, (7) stores marked up im- 
ages in a database, loads marked up images, runs iconic conflation to find 
matched images and conflates images using ontological similarity measure. 
This architecture is depicted in Figure 9. 



load images 



store marked up images 
and iconic annotations in 
a database 



mark up images with 
icons of the selected 
ontology 



generate ontological 
iconic annotation of 
images 



load iconized ontologies 



edit ontologies 



load marked up images 
and iconic annotations 



run iconic conflation 



load iconized ontologies 



Figure 9. Iconic conflation architecture 



7.1 Flood case 

Below we describe a conflation task of two images taken from [Lillesand, 
Kiefer, 1987]. In essence these images cover the same area, but one of them 
covered by water as a result of flood. The flood area is almost half of the 
image. We assume the scenario where person A annotates image fioodl 
independently of person B who annotates image 2 using the same or differ- 
ent ontologies. Because persons A and B do not coordinate image annotation 
process, they can pick up different terms from ontologies to mark the same 
spot in two images. This is a typical situation in GIS where the same object 
may have several individual and group manes associated. 

We start with Person A. We assume that person A’s only goal is to anno- 
tate the image with useful icons so it can be later used as a tool in conflation. 
This person even may not possess skills to accomplish conflation. 

Screenshots in Figure 10 show the initial steps of the iconic conflation 
process after opening a blank workspace of ART, loading an OWL Goefile 
ontology and a raster image. The tree based ontology is marked up with 
icons. Later these icons appear in the iconic annotation of the images. Since 
Geofile didn’t come with icons, synthetic ones are used instead. For simplic- 
ity of presentation the ontology is iconized with numeric icons. Any other 
icon can be loaded including icons located on the web using its URL. These 
icons can raster images or SVG files. 
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Figure 10. Loading an ontology and image 

The user is now ready to drag icons onto the image in order to mark it up. 
This is the starting situation for the creation of the bridge between the image 
and the domain information in the form of DAML-OWL ontology when an 
image and an ontology are loaded. Figure 1 1 shows the next step where sev- 
eral numeric icons have been drugged to markup the appropriate locations in 
the image by the user annotator. 




Figure 11. Image with various iconie annotations from the DAML-OWL Geofde ontology 

(in the middle). 
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The next step is to record the annotated image 1 . The user can store an- 
notated images and after that ART-MIKE software no longer has just a raster 
image to work with, but knows the location of various key features via 
iconic annotation. Finally the aimotation is committed to a database where it 
can be later pulled for reference data. Person A’s goal for this image is done. 

Now we move onto person B. Person B just received a photo of a flooded 
area (shown in Figures 12 and 13 next to the first image), and needs to know 
what was beneath that water area. 





Figure 13. Loaded annotated images 



These steps show the completion of one task (annotating images) and the 
start of another task (conflating images). Expert requests the system for a 
suggestive match. The software tries to determine a match based on the simi- 
larities of the annotations in each image and shows the results for the expert 
to evaluate. 

An automated approach to match up iconic annotations from one image 
to another is based on similarities of ontological concepts in the ontology 




556 



Chapter 21 



tree. The similarity of two annotations (annotated features), Fi and F 2 is 
measured by upper matching category (UMC) in the ontology that is: 

(1) the node itself if both nodes are the same, 

(2) one of the nodes, if one of the nodes is ancestor for another, and 

(3) a closest ancestor for both annotations otherwise. 

For instance, in the iconic summary in the bottom in Figure 12 (see also 
Figure 14 (a)) concept #62 (other installation) and concept #60 (air landing 
area) have concept # 76 (geographic location) as their closest ancestor. Con- 
cepts #76 and #64 (sea area) have #76 as a common closest ancestor and 
concept # 59 (supply area) appeared in both images has itself as UMC. The 
match level for each pair of these iconized concepts is the level of UMC in 
the ontology tree. For instance, node #76 is a root (level 1) and node 59 be- 
longs to the next level 2. 



# Iconic Summary 



^X] 



Category 59 76 76 

Level 2 11 

Flood1.jpg: 59 62 76 
Flood2.jpg: 59 60 64 



-Inixl 


Category 


59 76 76 


Level 


2 1 1 


Flood1.jpg: 


14 64 66 


Flood2.jpg: 


11 67 76 


Auto-Conflate 



(a) Ontological match for figure 13 



(b) Ontological match for figure 15 



Figure 14. Matching ontological icon categories 




Figure 15. Loaded images marked up differently 



Thus, we have three matched points (identified by icon matched icon lo- 
cations) in two images and an affine transform can be run to conflate them. 
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Now we can assume that two other persons C and D marked the same 
images as shown in Figure 15 below. Figure 14 (b) shows ontological match 
of these mark ups and Figure 16 shows the result of their conflation using 
these match. 




Figure 16. Result of affine conflation based on ontological match and its accuracy 



7.2 Historic maps case 

The same iconic ontological approach have been used to conflate historic 
maps using the same DAML-OWL Geofile ontology with terms: City (7), 
Operating Area (9), Bay (32), Port (35), Dock (39) AND Sea area (64). As 
can be seen below simple manipulation of a bitmap might not be enough to 
get everything matched up. In these cases the maps are hand drawn and are 
horribly inaccurate. Attempts on using scaling, translation, and rotation will 
fail, but ontological matching can be still correct. Figure 17(a) shows two 
images to be conflated: (1) modern Macau map, 2003 (on the left) and his- 
toric Portuguese map, 1889, on the right from the collection of the Library of 
the Congress [Macau, 2003]. Figure 17(b) shows that two images conflated 
and the third map is not conflated yet with two already conflated. This figure 
indicates the need for non-linear transformation. 
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(a) Two images to be conflated (b) Two images linearly conflated with the third 

image on the left to be conflated 

Figure 17. Two images to be conflated [Macau, Library of the Congress, 2003], 

See also color plates. 



7.3 Algorithm for finding best ontological match for con- 
flation 

The algorithm consists of seven steps: 

Step 1 . Collect all icons that mark up images to be conflated. 

Step 2. Identify ontology terms {t} that matched to icons and location of 
terms in the ontology tree for found icons. 

Step 3. Compute matrix M = {my}, where my is a similarity measure be- 
tween terms t; and tj in the ontology tree. Values my are computed for 
every pair of terms (ti, tj) where each term ti is from image 1 and each 
term tj is from image 2. 

Step 4. Finding the smallest element for every row of matrix M and ordering 
the set of these elements, L. 

Step 5. Select three smallest elements from L and test that they are from dif- 
ferent rows and columns (DRC) in M. If this is the case then it is consid- 
ered that the best match has been found between terms in two images. 
For instance, let these three elements be m 23 , rrus and mn then matched 
terms are (t 2 , ts), (t 4 , ts) and (ti, ty) and locations of respective icons are 
used for identifying matching affine transform between images. 

Step 6. If the first three elements fail DRC test then test other triples until 
DRC triple will be found or the set L is exhausted. 

Step 7. If W is a set of DRC triples found then find a triple <my, mdk, mw> 
with the smallest sum my + rUdk + mwv in comparison with other triples. 
This triple is called a conflation solution. 

The measure of similarity my, is computed in two alternative ways: (1) as an 
upper matching category (UMC) that is in essence the level of the closest 
ancestor (see description in section 7.1) and (2) as a number of nodes be- 
tween terms in the ontology tree. We prefer (1) because the deeper match 
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level then the higher chances that the match is not accidental. The same 
number of intermediate nodes used in (2) can be in both shallow and deep 
match. 



8. CONCLUSION 

In moving toward the goal of preserving human expertise in imagery 
analysis and capturing non-formalized context, virtual expert tools have been 
developed to assist knowledge engineers and image analysts in populating 
the knowledge base of the virtual expert. The first tool records an imagery 
analyst’s actions on the fly, assists the analyst in marking up imagery with 
iconized ontology terms and provides an ontological image conflation. The 
second tool generates expert rules by questioning the imagery analyst and 
minimizing questioning time using the theory of Monotone Boolean func- 
tions. These tools are implemented as a web portal using Java. 

Future work for a knowledge engineer includes developing tools for dis- 
covering imagery analysis mles, implement tools for recording conflation 
process, and developing tools to apply imagery analysis mles to conflation. 
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10. EXERCISES AND PROBLEMS 

1. Select two aerial photos from the web of the same area but with different 
spatial resolution. Design an iconic annotation for these images and pro- 
vide justification for your chose of icons, spatial features and ontology 
terms. 

2. Build an ontology that will fit images you used in exercise 1. It should be 
a tree with three or more levels. 

3. Build a sequence of questions based on four binary parameters xi, X2 X3, 
X4 starting from the simplest question that contains only parameter xi to 
be loaded to the knowledge extractor MIKE. 
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Tip: MIKE accept s binary vectors; the question with X| can only be en- 
coded as 1 000. Justify your sequence assuming monotonicity of expert’s 
answers. 
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Figure 1. Information visualizations for presentation and branding. Left NASDAQ dis- 
play and Right: Visual Insights’ eBizLive product for showing website activity 




Figure 2. Executive Dashboard courtesy of 
Bill Wright. 



Figure 3. Real-time 3D Visual Report courtesy of 
Visual Insights 
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Figure 4. Advizor 2000 Visual Discovery 
and Analysis Tool 
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Figure 5. Bar chart scalability is increased by 
using levels of rendering detail and a red overplot- 
ting indicator at the top of the view. Scalability in 
this case facilitates locating and then focusing 
attention on particular bars. 
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Figure 11. Hyperproof [http://www-csli. stanford.edu/hp/Hproof2.html] 




Fragment of Figure 12. Task diagram [http://www-csli.stanford.edu/hp/Hproof3a.html] 
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Figure 2. Tradi- 
tional and iconic 
visualizations of 
rule R] 
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Figure 3. Traditional and iconic visualizations of rules 



Fragment of Table 3 



Fragment of Tables 4 and 8. 
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Figure 4. Comparison of two visual reasoning alternatives 







Figure 5. Reasoning chains 
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Figure 6. Visual reasoning rules 









Figure 11. Integrated visual evidentiary reasoning scheme 



Figure 12. Signal ellipses 
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Figure 13. Visual rules with signal ellipses 
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Figure 1. Pieter Bruegel’s painting “Blue cloak” (1559), oil on oak panel, 1 17 x 163 cm. (with 
permission from Staatliche Museen zu Berlin - Gemaldegalerie, Berlin) 



Fragment of Table 1 . Encoding text in art 

Compressed content of the text: metaphor proverb and icon 





Figure 8. Examples of low-level visual 
correlations based on glyphs 



Figure 10. Rainbow correlation 
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(a) Magic lens (b) Linked panels 

Figure 12. High-level geospatial visual correlation for objects of different levels of resolution. 
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Table 8. Examples of Bruegel composite icons 
icons Composite icons 
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Table 10. 

Composite 

icon 

generation. 
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10 icons 

Figure 3. Dynamics of compression of iconic sentence. 

Figure 6. Icons with user 
defined weighting 





(a) The slashes across the 
lower right indicate that this is 
a small band (blue scale) of 
armed rebels (or terrorists). 




Figure 9. Possible icons for MUC concepts 



(b) This icon describes 
a fairly large number of 
civilian targets (blue 
scale) were hurt pretty 
badly (red scale) by 
some action. 
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Figure 10. Bruegel icon examples: base icons 
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Location: 
icon type 1 
Red shows 
country’s 
location. 

Location 
icon with 
city modi- 
fier 




Loeation: 
icon type 2. 
Lens shows 
a small 
country. 

Location 
icon with 
town modi- 
fier 




Figure 14. Bruegel icon examples: location icon types 
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Figure 15. Bruegel case studies. 
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Figure 1. a) An illustration of the operation, b) A sketch of a hyperspectral image set with 169 
spectral bands ranging from very short to very long bands, c) A color infrared (CIR) imagery 
of the semi-desert area in Eastern Washington. 
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a b c 

Figure 5. Scatterplots generated by MDS using 
document vectors with sizes equal to a) 200, b) 
100, and c) 50 terns. 



Vector Dimension 
Ds too 





Figure 6. Scatterplots generated by MDS using Figure 7. A scatterplot-matrix demonstrates the 

a) 3298, b) 1649, and c) 824 document vectors. impact of reducing document vectors (row) ver- 

sus reducing vector dimensions (column) using 
classical scaling technique. 

Vector Dimension Vector Dimension 

0 = 200 0=100 0 = 50 D = 169 0= 84 0 = 42 




Figure 8. A scatterplot matrix demonstrates the 
impact of reducing document vectors (row) versus 
reducing vector dimensions (column) using the 
Sammon Projection technique. 



Figure 10. Colors generated 
by the scatterplot clusters 
clearly identify different fea- 
tures of the images shown in 
Figure Ic 



Figure 9. A scatterplot matrix demonstrates the 
effects of reducing pixel vectors (row) versus 
reducing vector dimensions (column) using 
remote sensing imagery. 
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Figure 12. Scatterplots. 



Figure 14. Eigenvectors of the scatterplots 



Figure 13. An illustra- 
tion of our multiple 
sliding window design 
in visualizing data 
streams 
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ter is represented by a unique random color. B) Corresponding cluster colors are projected to 

the map position 




Figure 16. The Eigenvectors 
of the scatterplots 
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Figure 4. Fragment of EEG 
containing 100 segments 
presented by 36 features in 
which the EEG-viewer rec- 
ognized five artifacts. 
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Figure 2. Data 
visualization: 
original data on the 
left and simulta- 
neously rescaled 
data on the right 
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Figure IF Breast cancer cases visualized using procedure -P 3 with cases 
shown as bars with frames. 




Figure 14. A 3-D version of Monotone 
Boolean Visual Discovery with vertical and 
horizontal surfaces used. 




Figure id. A 3-D version of Monotone 
Boolean Visual Discovery 
with grouping Hansel chains 
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Figure 2. Local inconsistency between imagery source and vector product 



The decision level task: 

Move troops from the blue point 
to the red point using the fastest 
route (using or not using roads). 



/ 



S 



s. 






















/ 



Conflict pertain to objective: Roads have 
^ curvature, slope, width on the imagery 
different from the map, thus evaluation of 
the shortest pass may provide conflicting 
results. 



Figure 5. Decision level conflation problem 
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Image (a) Image (b) 

Figure 16. Illustration of offset, scaling, and rotation phenomenon at the scale of warping; 
i.e., 100s pixels. Image (a) and (b) or the time lapsed image pair. Blue vector maps corre- 
sponding river bend features. White vector maps the lake. The dark vector maps the registra- 
tion vector for the selected features. The lower two figures are the displays of entropy trans- 
fonned using equation 8 on the corresponding intensity imagery pair. 




Figure 18. Intensity correlation non-unique and wrong solution. Radiance intensity-based 
correlation from total of 49 locations centered on a 7x 7 grid over the image pairs in Figure 
16. The connection of the correlation peaks does not have the same trend as the registration 
vector displayed in Figure 16. See also color plates. 
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Figure 19. Entropy correlation. The trend matches with data. Entropy-based correlation with 
the same lay out in 18. The connection of the of the correlation peaks does have the same 
trend as the registration vector from Figure 16. See also color plates. 
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Figure 8. Illustration of struc- 
tural lengths 
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Fragment of Figure 4. Expert question- 
ing using monotonicity principle 




Figure 9. Two images at shape extraction stage. 
One of the images is scaled disproportionally. 
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(a) Before eonflation. (b) Results of conflation 

Figure 14. Conflation of smaller images which are parts of the aerial photo and map. 
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